On Tue, Aug 28, 2018 at 4:02 PM Alan Conrad via USRP-users <
usrp-users@lists.ettus.com> wrote:

> Hi All,
>
>
>
> I’ve been working on an application that requires two receive streams and
> two transmit streams, written using the C++ API.  I have run into a problem
> when transmitting packets and I am hoping that someone has seen something
> similar and/or may be able to shed some light on this.
>
>
>
> My application is streaming two receive and two transmit channels, each at
> 100 Msps over dual 10GigE interfaces (NIC is Intel X520-DA2).  I have two
> receive threads, each calling recv() on separate receive streams, and two
> transmit threads each calling send(), also on separate transmit streams.
> Each receive thread copies samples into a large circular buffer.  Each
> transmit thread reads samples from the buffer to be sent in the send()
> call.  So, each receive thread is paired with a transmit thread through a
> shared circular buffer with some mutex locking to prevent simultaneous
> access to shared circular buffer memory.
>
>
>
> I did read in the UHD manual that recv() is not thread safe.  I assumed
> that this meant that recv() is not thread safe when called on the same
> rx_streamer from two different threads but would be ok when called on
> different rx_streamers.  If this is not the case, please let me know.
>
>
>
> On to my problem…
>
>
>
> After running for several minutes, one of the transmit threads will get
> stuck in the send() call.  Using strace to monitor the system calls it
> appears that the thread is in a loop continuously calling the
>
> poll() and recvfrom() system calls from within the UHD API.  Here’s the
> output of strace attached to one of the transmit threads after this has
> occurred.  These are the only two system calls that get logged for the
> transmit thread once this problem occurs.
>
>
>
> 11:19:04.564078 poll([{fd=62, events=POLLIN}], 1, 100) = 0 (Timeout)
>
> 11:19:04.664276 recvfrom(62, 0x5619724e90c0, 1472, MSG_DONTWAIT, NULL,
> NULL) = -1 EAGAIN (Resource temporarily unavailable)
>
> 11:19:04.664381 poll([{fd=62, events=POLLIN}], 1, 100) = 0 (Timeout)
>
> 11:19:04.764600 recvfrom(62, 0x5619724e90c0, 1472, MSG_DONTWAIT, NULL,
> NULL) = -1 EAGAIN (Resource temporarily unavailable)
>
> 11:19:04.764699 poll([{fd=62, events=POLLIN}], 1, 100) = 0 (Timeout)
>
> 11:19:04.864906 recvfrom(62, 0x5619724e90c0, 1472, MSG_DONTWAIT, NULL,
> NULL) = -1 EAGAIN (Resource temporarily unavailable)
>
>
>
> This partial stack trace shows that the transmit thread is stuck in the
> while loop in the tx_flow_ctrl() function.  I think this is happening due
> to missed or missing TX flow control packets.
>
>
>
> #0  0x00007fdb8fe4fbf9 in __GI___poll (fds=fds@entry=0x7fdb167fb510,
> nfds=nfds@entry=1, timeout=timeout@entry=100) at
> ../sysdeps/unix/sysv/linux/poll.c:29
>
> #1  0x00007fdb9186de45 in poll (__timeout=100, __nfds=1,
> __fds=0x7fdb167fb510) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
>
> #2  uhd::transport::wait_for_recv_ready (timeout=0.10000000000000001,
> sock_fd=<optimized out>) at
> /home/aconrad/rfnoc/src/uhd/host/lib/transport/udp_common.hpp:59
>
> #3  udp_zero_copy_asio_mrb::get_new (index=@0x55726266f6e8: 28,
> timeout=<optimized out>, this=<optimized out>)
>
>     at /home/aconrad/rfnoc/src/uhd/host/lib/transport/udp_zero_copy.cpp:79
>
> #4  udp_zero_copy_asio_impl::get_recv_buff (this=0x55726266f670,
> timeout=<optimized out>) at
> /home/aconrad/rfnoc/src/uhd/host/lib/transport/udp_zero_copy.cpp:226
>
> #5  0x00007fdb915d48cc in tx_flow_ctrl (fc_cache=..., async_xport=...,
> endian_conv=0x7fdb915df600 <uhd::ntohx<unsigned int>(unsigned int)>,
>
>     unpack=0x7fdb918b1090
> <uhd::transport::vrt::chdr::if_hdr_unpack_be(unsigned int const*,
> uhd::transport::vrt::if_packet_info_t&)>)
>
>     at
> /home/aconrad/rfnoc/src/uhd/host/lib/usrp/device3/device3_io_impl.cpp:345
>
>
>
> The poll() and recvfrom() calls are in the
> udp_zero_copy_asio_mrb::get_new() function in udp_zero_copy.cpp.
>
>
>
> Has anyone seen this problem before or have any suggestions on what else
> to look at to further debug this problem?  I have not yet used Wireshark to
> see what’s happening on the wire, but I’m planning to do that.  Also note
> that, if I run a single transmit/receive pair (instead of two) I don’t see
> this problem and everything works as I expect.
>
>
>
> My hardware is an X310 with the XG firmware and dual SBX-120
> daughterboards.  Here are the software versions I’m using, as displayed by
> the UHD API when the application starts.
>
>
>
> [00:00:00.000049] Creating the usrp device with:
> addr=192.168.30.2,second_addr=192.168.40.2...
>
> [INFO] [UHD] linux; GNU C++ version 7.3.0; Boost_106501;
> UHD_4.0.0.rfnoc-devel-788-g1f8463cc
>
>
>
> The host is a Dell PowerEdge R420 with 24 CPU cores and 24 GB ram.  I
> think the clock speed is a little lower than recommended at 2.7 GHz but
> thought that I could distribute the work load across the various cores to
> account for that.  Also, I have followed the instructions to setup dual 10
> GigE interfaces for the X310 here,
> https://kb.ettus.com/Using_Dual_10_Gigabit_Ethernet_on_the_USRP_X300/X310.
>
>
>
> Any help is appreciated.
>

I think you're hitting this:

  https://github.com/EttusResearch/uhd/issues/203

Which is the same thing that I hit.  I tracked it down to something
happening in the FPGA with the DMA FIFO.

I rebuilt my FPGA and UHD off the following commits, which switch over to
byte based flow control:

  UHD commit 98057752006b5c567ed331c5b14e3b8a281b83b9
  FPGA commit c7015a9a57a77c0e312f0c56e461ac479cf7f1e9

And the problem disappeared for the time being.  The infinite loop still
exists as a potential issue, but it seemed whatever was causing the lockup
in the DMA FIFO disappeared or at least couldn't be reproduced.

Give that a shot and see if it works for you, or if you can still reproduce
it?  We never got to the root cause of the problem.

Brian
_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Reply via email to