Hi Nate, Hi Rob, You can’t believe how much I appreciate your help. Finally, after trying it for months: IT WORKS!!!!! TX streaming two different waveforms on the two channels of the USRP X310 with the two UBX-160 daughterboards with full bandwidth of 200MS/s per channels. I would like to invite you for a champagne. I changed the rfnoc_ce_default_inst_x310.v file as you described below and built the XGS image with the full design vivado 2017.4 version (free one month trial license). UHD v3.14.1.1 needs this version because before I had vivado 2018.3 which was not working. Setting the CPU governor to performance, setting the network buffers as described and the MTU to 8000 and setting up DPDK as described in the ettus manual. Even without disabling hyper threading and KTPI with my host CPU (16GB RAM, Intel Xeon W-2125 CPU@ 4.5GHz x8). Using dual 10Gbit Ethernet and a changed version of the example “tx_samples_from_file” by reading in a second file with my second waveform.
Thank you a lot for your help. Thomas From: Nate Temple Sent: Wednesday, December 11, 2019 7:00 PM To: Thomas Harder Cc: Rob Kossler; USRP-users@lists.ettus.com; EJ Kreinar Subject: Re: [USRP-users] transmitting on two channels with replay block On Wed, Dec 11, 2019 at 9:33 AM Nate Temple <nate.tem...@ettus.com> wrote: Hi Thomas, You will need to apply these changes below to the fpga-src/usrp3/top/x300/rfnoc_ce_default_inst_x310.v file. This will add additional SRAM FIFOs, which is basically what the "XGS" / SRAM image is. Make sure to start with the v3.14.1.1 fpga sources. (run git submodule init; git submodule update; in your UHD repo after checking out v3.14.1.1). ######################################################################## diff --git a/usrp3/top/x300/rfnoc_ce_default_inst_x310.v b/usrp3/top/x300/rfnoc_ce_default_inst_x310.v index d20a64962..bcb4c3c32 100644 --- a/usrp3/top/x300/rfnoc_ce_default_inst_x310.v +++ b/usrp3/top/x300/rfnoc_ce_default_inst_x310.v @@ -1,4 +1,4 @@ - localparam NUM_CE = 4; // Must be no more than 10 (6 ports taken by transport and IO connected CEs) + localparam NUM_CE = 6; // Must be no more than 10 (6 ports taken by transport and IO connected CEs) wire [NUM_CE*64-1:0] ce_flat_o_tdata, ce_flat_i_tdata; wire [63:0] ce_o_tdata[0:NUM_CE-1], ce_i_tdata[0:NUM_CE-1]; @@ -46,7 +46,9 @@ genvar n; generate for (n = 4; n < NUM_CE; n = n + 1) begin - noc_block_axi_fifo_loopback inst_noc_block_axi_fifo_loopback ( + noc_block_axi_fifo_loopback #( + .STR_SINK_FIFOSIZE(15) + ) inst_noc_block_axi_fifo_loopback ( .bus_clk(bus_clk), .bus_rst(bus_rst), .ce_clk(ce_clk), .ce_rst(ce_rst), .i_tdata(ce_o_tdata[n]), .i_tlast(ce_o_tlast[n]), .i_tvalid(ce_o_tvalid[n]), .i_tready(ce_o_tready[n]), ######################################################################## After making these modifications to the FPGA sources, you can build a FPGA image with the commands: cd fpga-src/usrp3/top/x300/ source setupenv.sh make X310_XG Note: Even though you are calling X310_XG, it is really a "XGS" image since it has the additional SRAM fifos. After that has completed building, you should write that FPGA image to the X310 using uhd_image_loader. uhd_image_lodaer --args "addr=192.168.40.2,type=x300" --fpga-path /path/to/x300.bit After the FPGA image load and restarting the USRP, run uhd_usrp_probe and at the end of the output where the RFNoC blocks are listed, you should see two additional FIFO blocks: FIFO_0 FIFO_1 Random performance tuning notes: * Ensure your CPU governor is set to performance: sudo apt install cpufrequtils To set performance for all cores: for ((i=0;i<$(nproc);i++)); do sudo cpufreq-set -c $i -r -g performance; done Verify with: cpufreq-info * Set your network buffers sudo sysctl -w net.core.rmem_max=625000000 sudo sysctl -w net.core.wmem_max=625000000 * Set the MTU to 8000 on your 10Gb NICs * Ensure you have pthreads enabled for your user https://kb.ettus.com/Building_and_Installing_the_USRP_Open-Source_Toolchain_(UHD_and_GNU_Radio)_on_Linux#Thread_priority_scheduling http://files.ettus.com/manual/page_general.html#general_threading * Disable hyper threading in bios. This will typically give about a 10% boost in core performance if you can work without the additional cores. You'll need to update your cpu core list in DPDK. * Disable KPTI for spectra/meltdown. I would recommend to try disabling the KPTI protections for your CPU if the machine is offline, you may see a 10-15% performance increase. This can be done by adding the lines below to your /etc/default/grub at GRUB_CMDLINE_LINUX_DEFAULT="", then running sudo update-grub and rebooting. pti=off spectre_v2=off l1tf=off nospec_store_bypass_disable no_stf_barrier Note, this disables protections against Meltdown/Spectra (links below). So if you try to do this, I would recommend disconnecting that host from any internet connected network. https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability) https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) * There are additional recommendations here from Intel on various adjustments you can do to improve performance with DPDK: http://doc.dpdk.org/guides/linux_gsg/nic_perf_intel_platform.html Specifically I would recommend to try section 10.1.3 #3 where you isolate the CPU cores that are used for DPDK. * Here is a performance report from Intel on DPDK 17.11: https://fast.dpdk.org/doc/perf/DPDK_17_11_Intel_NIC_performance_report.pdf In the tables of boot and bio's settings the additional CPU options of nohz_full="" and rcu_nocbs="" are added to their kernel configs, this may help as well. Additionally they made the changes listed below: CPU Power and Performance Policy <Performance> (you should already be doing this) CPU C-state Disabled CPU P-state Disabled Enhanced Intel® Speedstep® Tech Disabled Turbo Boost Disabled Regards, Nate Temple On Wed, Dec 11, 2019 at 9:18 AM Thomas Harder <thomas.har...@oca.eu> wrote: Rob, I am definitely interested in your costum ‘txarb’ RFNoC block. For now I am using tx waveforms of about 10.000 samples, so the 2^15 samples would be sufficient. I was already searching what exactly this SRAM image means. Because today I was able to setup DPDK with UHD 3.14.1 and the benchmark_rate code(excactly described as in the mail of Nate) was still full of underruns with the stock XG fpga image which I downloaded with uhd_images_downloader. So I will also try to build a second FIFO block, since I have still for two weeks the test version of Vivado. Thomas From: Rob Kossler Sent: Wednesday, December 11, 2019 4:50 PM To: Thomas Harder; Nate Temple Subject: Re: [USRP-users] transmitting on two channels with replay block Thomas, I believe that Nate and I were saying basically the same thing. When he referred to an SRAM image, I believe that this means an image with the FIFO blocks. I believe that such an image needs to be built by the user (rather than downloaded using uhd_images_downloader), but I'm not 100% certain. If you are interested, I have a custom 'txarb' RFNoC block that implements my 2nd option below. By default, it includes storage of up to 2^15 samples, but this can be modified using an input parameter (FPGA resources permitting). This block requires some specialized behavior, but it is pretty simple. Similar to the Replay block, you need to construct a custom RFNoC graph that connects the txarb block to the Radio. When you want to stream, you need to stream just one full waveform to the txarb block. Once the txarb block receives end-of-burst, it will automatically stop "recording the samples to memory" and begin "playing the samples from memory repeatedly". The streaming will continue indefinitely until you send a new tx waveform. If the new tx waveform contains less than 2 samples, the streaming is turned off. There are no control registers to worry about. Timed behavior is supported because the block preserves the command time of the incoming stream from the host when it starts playing out. It is not terribly difficult to build this custom block, but if you haven't built out-of-tree RFNOC blocks before, it might be easiest to just put this block in-tree (in the Ettus folder structure) and manually modify makefiles as needed. Let me know if you are interested. Rob On Wed, Dec 11, 2019 at 10:07 AM Nate Temple <nate.tem...@ettus.com> wrote: Hi Thomas, One option instead of using the Replay block could be to stream 2x 200e6 from your host. On the X310, this requires using a SRAM image and DPDK. DPDK support was added with UHD 3.14.1.0 for the X310, I'd suggest to use 3.14.1.1 at this time though. Some links on DPDK: https://www.dpdk.org/ http://files.ettus.com/manual/page_dpdk.html I've been able to run 2x2 @ 200e6 with the X310 with DPDK using a 4 GHz CPU. ./benchmark_rate --rx_rate 200e6 --rx_channels 0,1 --tx_rate 200e6 --tx_channels 0,1 --args "addr=192.168.10.2,second_addr=192.168.20.2,use_dpdk=1,num_recv_frames=512,enable_tx_dual_eth=1,skip_ddc=1,skip_duc=1" num_recv_frames=512 can help if you're seeing overflows. enable_tx_dual_eth=1 is required for 2x TX @ 200e6 skip_ddc=1,skip_duc=1 can help as well since you'd be sending at full rate. Regards, Nate Temple On Wed, Dec 11, 2019 at 7:03 AM Rob Kossler via USRP-users <usrp-users@lists.ettus.com> wrote: I do not think it is possible using the stock FPGA image. However, I can think of a couple of possibilities • On the N310, Ettus includes 4 FIFO blocks (rather than the DmaFIFO which used the off-FPGA RAM for memory), to provide capability for 4x125 MS/s streaming. Perhaps if you built an X310 FPGA image with 2 such FIFO blocks, you could use these rather than the DmaFIFO and achieve the desired streaming. Note that this requires a Vivado license to build your own FPGA image, but does not require FPGA experience because you would be building an image with "stock" blocks. One caution though is that streaming at this very high rate still requires a high performance host and so it is still possible that you would have underruns if your host could not keep up. If you go this route, I believe you will likely need to use the "DPDK" capability which is a bit of a pain to configure and get it working properly. • Another possibility is to create a custom RFNoC block that is similar to the replay block but that uses FPGA memory to store a fixed duration waveform and then plays it out cyclically like the replay block. The Ettus 'window' RFNoC block provides a good example of how to store coefficients and play them out repeatedly. But, making the needed modifications is not a trivial task except for someone who is pretty good at FPGA programming. Given that you were trying the replay block, I'm guessing that your Tx waveforms are of fixed duration. What is the duration (in number of samples) that you require? Rob On Wed, Dec 11, 2019 at 5:05 AM Thomas Harder <thomas.har...@oca.eu> wrote: Thank you Rob for this comment. But I am not sure if I understand you correctly. Do you want to say, that it is IMPOSSIBLE to stream TX two different waveforms synchronized on the 2 channels of the x310 with the full bandwidth of 200MS/s on each channel? That is what I am trying the last 6 months full time, starting with Labview under windows and then UHD under Linux with a Dell Precision 5820 desktop (16GB RAM, Intel Xeon W-2125 CPU@ 4.GHz x8) with MXI connection, dual 10Gbit connection(Intel X520-DA2), the replay block recently: always the same result: continuous underruns. If you can confirm that this is not possible without an important FPGA change (because I have no experience in this field and I have not the time to invest into it), I must search for another solution to create two different synchronized RF waveforms with 160MHz bandwidth (optical, electronical,…) because this will be just a part of my experimental setup but it is crucial to go on . I am thankful for any advise, Thomas From: Rob Kossler Sent: Tuesday, December 10, 2019 5:01 AM To: Thomas Harder Cc: Sam Reiter; usrp-users@lists.ettus.com Subject: Re: [USRP-users] transmitting on two channels with replay block Apart from solving the underrun issue, there is also an issue with synchronization. The replay block doesn't presently support timed commands. And, as a side note, the issue with streaming from the host is not just the host. The DMA FIFO has a maximum bandwidth of something like 600 MS/s (combination of all inputs and outputs) that precludes streaming 400 MS/s in and out of the block simultaneously. So, even if the host could keep up, the FIFO could not. Rob On Mon, Dec 9, 2019 at 4:34 AM Thomas Harder via USRP-users <usrp-users@lists.ettus.com> wrote: Hi Sam, Thank you for your reply. This morning I set the MCR to 184.32 and I am still having continuous underruns using also replay_ctrl->get_record_fullness for both channels. But since I need the full bandwidth of 160MHz I would like implement a second replay block in my fpga image. Could anyone help me with this? I am really new in fpga programming and for the image with one replay block I was just following the instructions in https://kb.ettus.com/Using_the_RFNoC_Replay_Block. Thank you, Thomas From: Sam Reiter Sent: Friday, December 6, 2019 10:23 PM To: Thomas Harder Cc: usrp-users@lists.ettus.com Subject: Re: [USRP-users] transmitting on two channels with replay block Thomas, Upon further investigation, we may be running up to a practical limit of a single CHDR interface rather than an issue with your code. A single replay block servicing two radios will have a max (theoretical) rate of 187.5 MSPS on either channel. This means that you might be able to squeeze full rate out on 2 channels with an MCR of 184.32, but that's cutting it pretty close. Sounds like 2 channels at 200 MSPS with a replay setup will require 2 replay blocks serving each channel independently. If you end up trying either of the above out, I'd be curious to know what results you observe. Sam Reiter Ettus Research On Fri, Dec 6, 2019 at 2:38 PM Sam Reiter <sam.rei...@ettus.com> wrote: Thomas, I'd need to set it up on my end, but I believe you can TX two distinct waveforms from a single replay block instance. You'd need to make sure that your adding your data to the buffer in separate locations and at an address that is a multiple of 8 bytes (which it looks like you're doing from the above snippets). Are you seeing continuous underruns, or just a handful at the beginning on the run? Does your duplicated code also use: replay_ctrl->get_record_fullness on both channels before kicking off the stream start? Sam Reiter Ettus Research On Wed, Dec 4, 2019 at 3:48 AM Thomas Harder via USRP-users <usrp-users@lists.ettus.com> wrote: Hello everyone, Is it possible to transmit two different waveforms on the two channels of the USRP X310 with the two UBX-160 daughterboards? I want to transmit two different waveforms simultaneous (synchronized ) on the two channels of the USRP with the full sample rate of 200 MS/s. I tried already to do it with a dual 10Gbit-ethernet connection and I seemed to be limited by my computer. Now I am trying to do it with the replay block. I built the FPGA image with one Replay block as described in https://kb.ettus.com/Using_the_RFNoC_Replay_Block to run the example “replay_samples_from_file” and it is working fine if I transmit just on one channel. Now I was modifying the code by connecting the replay block to both channels: replay_graph->connect(replay_ctrl->get_block_id(),replay_chan,tx_blockid,tx_chan,replay_spp); replay_graph->connect(replay_ctrl->get_block_id(),replay_chan1,tx_blockid1,tx_chan,replay_spp); and writing the same waveform into another region of the DRAM-buffer: replay_ctrl->config_record(0,words_to_replay*replay_word_size, replay_chan); replay_ctrl->config_record(20000,words_to_replay*replay_word_size, replay_chan1); and replay_ctrl->config_play(0,words_to_replay*replay_word_size, replay_chan); replay_ctrl->config_play(20000,words_to_replay*replay_word_size, replay_chan1); where words_to_replay*replay_word_size=16000 replay_chan=0 replay_chan1=1 tx_blockid=0/Radio_0 tx_blockid=0/Radio_1 then I stream my waveforms to the replay block as defined in the example and I start to replay the data: replay_ctrl->issue_stream_cmd(stream_cmd, replay_chan); replay_ctrl->issue_stream_cmd(stream_cmd, replay_chan1); It works but with plenty of Underflows!! So what does it mean when it says in the manual: “Note that the record and playback buffers do not need to the same, allowing a single Replay block to both record and playback to different regions of memory simultaneously.” (https://kb.ettus.com/Using_the_RFNoC_Replay_Block)? Because in the manual it says also: “The replay block has the following features: One input and one output” (https://files.ettus.com/manual/classuhd_1_1rfnoc_1_1replay__block__ctrl.html) So if the replay block has just one output why does it have two channels connected to it (replay_chan= 0 and 1)? If one replay block can just stream to one channel at the same time, can I implement easily a second replay block in the FPGA to stream on the two channels of my USRP two different waveforms simultaneously? Thank you, Thomas _______________________________________________ USRP-users mailing list USRP-users@lists.ettus.com http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com _______________________________________________ USRP-users mailing list USRP-users@lists.ettus.com http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com _______________________________________________ USRP-users mailing list USRP-users@lists.ettus.com http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
_______________________________________________ USRP-users mailing list USRP-users@lists.ettus.com http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com