There is a bug in the DMA FIFO read logic that is likely the root cause of
this.  Changing the line below in axi_dma_fifo.v fixed it for me.

OUTPUT2: begin
// Replicated write logic to break a read timing critical path for
read_count
read_count <= (output_page_boundry < occupied_minus_one) ?
output_page_boundry[7:0] : occupied_minus_one[7:0];
-            read_count_plus_one <= (output_page_boundry <
occupied_minus_one) ? ({1'b0,output_page_boundry[7:0]} + 9'd1) : {1'b0,
occupied[7:0]};
+            read_count_plus_one <= (output_page_boundry <
occupied_minus_one) ? ({1'b0,output_page_boundry[7:0]} + 9'd1) : ({1'b0,
occupied_minus_one[7:0]} + 9'd1);

-Juan

On Wed, Aug 29, 2018 at 9:30 AM Alan Conrad via USRP-users <
usrp-users@lists.ettus.com> wrote:

> Thanks Brian, that certainly sounds like the problem I’m experiencing.
> I’ll try rebuilding my FPGA and UHD as you suggest.  If that doesn’t work
> or I get more information, I let you know.
>
>
>
> Thanks again,
>
>
>
> Al
>
>
>
> *From:* Brian Padalino <bpadal...@gmail.com>
> *Sent:* Tuesday, August 28, 2018 8:57 PM
> *To:* Alan Conrad <acon...@gogoair.com>
> *Cc:* USRP-users@lists.ettus.com
> *Subject:* Re: [USRP-users] Transmit Thread Stuck Receiving Tx Flow
> Control Packets
>
>
>
>
>
> On Tue, Aug 28, 2018 at 4:02 PM Alan Conrad via USRP-users <
> usrp-users@lists.ettus.com> wrote:
>
> Hi All,
>
>
>
> I’ve been working on an application that requires two receive streams and
> two transmit streams, written using the C++ API.  I have run into a problem
> when transmitting packets and I am hoping that someone has seen something
> similar and/or may be able to shed some light on this.
>
>
>
> My application is streaming two receive and two transmit channels, each at
> 100 Msps over dual 10GigE interfaces (NIC is Intel X520-DA2).  I have two
> receive threads, each calling recv() on separate receive streams, and two
> transmit threads each calling send(), also on separate transmit streams.
> Each receive thread copies samples into a large circular buffer.  Each
> transmit thread reads samples from the buffer to be sent in the send()
> call.  So, each receive thread is paired with a transmit thread through a
> shared circular buffer with some mutex locking to prevent simultaneous
> access to shared circular buffer memory.
>
>
>
> I did read in the UHD manual that recv() is not thread safe.  I assumed
> that this meant that recv() is not thread safe when called on the same
> rx_streamer from two different threads but would be ok when called on
> different rx_streamers.  If this is not the case, please let me know.
>
>
>
> On to my problem…
>
>
>
> After running for several minutes, one of the transmit threads will get
> stuck in the send() call.  Using strace to monitor the system calls it
> appears that the thread is in a loop continuously calling the
>
> poll() and recvfrom() system calls from within the UHD API.  Here’s the
> output of strace attached to one of the transmit threads after this has
> occurred.  These are the only two system calls that get logged for the
> transmit thread once this problem occurs.
>
>
>
> 11:19:04.564078 poll([{fd=62, events=POLLIN}], 1, 100) = 0 (Timeout)
>
> 11:19:04.664276 recvfrom(62, 0x5619724e90c0, 1472, MSG_DONTWAIT, NULL,
> NULL) = -1 EAGAIN (Resource temporarily unavailable)
>
> 11:19:04.664381 poll([{fd=62, events=POLLIN}], 1, 100) = 0 (Timeout)
>
> 11:19:04.764600 recvfrom(62, 0x5619724e90c0, 1472, MSG_DONTWAIT, NULL,
> NULL) = -1 EAGAIN (Resource temporarily unavailable)
>
> 11:19:04.764699 poll([{fd=62, events=POLLIN}], 1, 100) = 0 (Timeout)
>
> 11:19:04.864906 recvfrom(62, 0x5619724e90c0, 1472, MSG_DONTWAIT, NULL,
> NULL) = -1 EAGAIN (Resource temporarily unavailable)
>
>
>
> This partial stack trace shows that the transmit thread is stuck in the
> while loop in the tx_flow_ctrl() function.  I think this is happening due
> to missed or missing TX flow control packets.
>
>
>
> #0  0x00007fdb8fe4fbf9 in __GI___poll (fds=fds@entry=0x7fdb167fb510,
> nfds=nfds@entry=1, timeout=timeout@entry=100) at
> ../sysdeps/unix/sysv/linux/poll.c:29
>
> #1  0x00007fdb9186de45 in poll (__timeout=100, __nfds=1,
> __fds=0x7fdb167fb510) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
>
> #2  uhd::transport::wait_for_recv_ready (timeout=0.10000000000000001,
> sock_fd=<optimized out>) at
> /home/aconrad/rfnoc/src/uhd/host/lib/transport/udp_common.hpp:59
>
> #3  udp_zero_copy_asio_mrb::get_new (index=@0x55726266f6e8: 28,
> timeout=<optimized out>, this=<optimized out>)
>
>     at /home/aconrad/rfnoc/src/uhd/host/lib/transport/udp_zero_copy.cpp:79
>
> #4  udp_zero_copy_asio_impl::get_recv_buff (this=0x55726266f670,
> timeout=<optimized out>) at
> /home/aconrad/rfnoc/src/uhd/host/lib/transport/udp_zero_copy.cpp:226
>
> #5  0x00007fdb915d48cc in tx_flow_ctrl (fc_cache=..., async_xport=...,
> endian_conv=0x7fdb915df600 <uhd::ntohx<unsigned int>(unsigned int)>,
>
>     unpack=0x7fdb918b1090
> <uhd::transport::vrt::chdr::if_hdr_unpack_be(unsigned int const*,
> uhd::transport::vrt::if_packet_info_t&)>)
>
>     at
> /home/aconrad/rfnoc/src/uhd/host/lib/usrp/device3/device3_io_impl.cpp:345
>
>
>
> The poll() and recvfrom() calls are in the
> udp_zero_copy_asio_mrb::get_new() function in udp_zero_copy.cpp.
>
>
>
> Has anyone seen this problem before or have any suggestions on what else
> to look at to further debug this problem?  I have not yet used Wireshark to
> see what’s happening on the wire, but I’m planning to do that.  Also note
> that, if I run a single transmit/receive pair (instead of two) I don’t see
> this problem and everything works as I expect.
>
>
>
> My hardware is an X310 with the XG firmware and dual SBX-120
> daughterboards.  Here are the software versions I’m using, as displayed by
> the UHD API when the application starts.
>
>
>
> [00:00:00.000049] Creating the usrp device with:
> addr=192.168.30.2,second_addr=192.168.40.2...
>
> [INFO] [UHD] linux; GNU C++ version 7.3.0; Boost_106501;
> UHD_4.0.0.rfnoc-devel-788-g1f8463cc
>
>
>
> The host is a Dell PowerEdge R420 with 24 CPU cores and 24 GB ram.  I
> think the clock speed is a little lower than recommended at 2.7 GHz but
> thought that I could distribute the work load across the various cores to
> account for that.  Also, I have followed the instructions to setup dual 10
> GigE interfaces for the X310 here,
> https://kb.ettus.com/Using_Dual_10_Gigabit_Ethernet_on_the_USRP_X300/X310
> <https://protect-us.mimecast.com/s/MKjlC31KyLFp6Lm5cgYuvL?domain=kb.ettus.com>
> .
>
>
>
> Any help is appreciated.
>
>
>
> I think you're hitting this:
>
>
>
>   https://github.com/EttusResearch/uhd/issues/203
>
>
>
> Which is the same thing that I hit.  I tracked it down to something
> happening in the FPGA with the DMA FIFO.
>
>
>
> I rebuilt my FPGA and UHD off the following commits, which switch over to
> byte based flow control:
>
>
>
>   UHD commit 98057752006b5c567ed331c5b14e3b8a281b83b9
>
>   FPGA commit c7015a9a57a77c0e312f0c56e461ac479cf7f1e9
>
>
>
> And the problem disappeared for the time being.  The infinite loop still
> exists as a potential issue, but it seemed whatever was causing the lockup
> in the DMA FIFO disappeared or at least couldn't be reproduced.
>
>
>
> Give that a shot and see if it works for you, or if you can still
> reproduce it?  We never got to the root cause of the problem.
>
>
>
> Brian
> _______________________________________________
> USRP-users mailing list
> USRP-users@lists.ettus.com
> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>
_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Reply via email to