[USRP-users] Transmit Thread Stuck Receiving Tx Flow Control Packets

Alan Conrad via USRP-users Tue, 28 Aug 2018 13:03:01 -0700

Hi All,

I've been working on an application that requires two receive streams and two 
transmit streams, written using the C++ API.  I have run into a problem when 
transmitting packets and I am hoping that someone has seen something similar 
and/or may be able to shed some light on this.


My application is streaming two receive and two transmit channels, each at 100 
Msps over dual 10GigE interfaces (NIC is Intel X520-DA2).  I have two receive 
threads, each calling recv() on separate receive streams, and two transmit 
threads each calling send(), also on separate transmit streams.  Each receive 
thread copies samples into a large circular buffer.  Each transmit thread reads 
samples from the buffer to be sent in the send() call.  So, each receive thread 
is paired with a transmit thread through a shared circular buffer with some 
mutex locking to prevent simultaneous access to shared circular buffer memory.

I did read in the UHD manual that recv() is not thread safe.  I assumed that 
this meant that recv() is not thread safe when called on the same rx_streamer 
from two different threads but would be ok when called on different 
rx_streamers.  If this is not the case, please let me know.

On to my problem...

After running for several minutes, one of the transmit threads will get stuck 
in the send() call.  Using strace to monitor the system calls it appears that 
the thread is in a loop continuously calling the
poll() and recvfrom() system calls from within the UHD API.  Here's the output 
of strace attached to one of the transmit threads after this has occurred.  
These are the only two system calls that get logged for the transmit thread 
once this problem occurs.

11:19:04.564078 poll([{fd=62, events=POLLIN}], 1, 100) = 0 (Timeout)
11:19:04.664276 recvfrom(62, 0x5619724e90c0, 1472, MSG_DONTWAIT, NULL, NULL) = 
-1 EAGAIN (Resource temporarily unavailable)
11:19:04.664381 poll([{fd=62, events=POLLIN}], 1, 100) = 0 (Timeout)
11:19:04.764600 recvfrom(62, 0x5619724e90c0, 1472, MSG_DONTWAIT, NULL, NULL) = 
-1 EAGAIN (Resource temporarily unavailable)
11:19:04.764699 poll([{fd=62, events=POLLIN}], 1, 100) = 0 (Timeout)
11:19:04.864906 recvfrom(62, 0x5619724e90c0, 1472, MSG_DONTWAIT, NULL, NULL) = 
-1 EAGAIN (Resource temporarily unavailable)

This partial stack trace shows that the transmit thread is stuck in the while 
loop in the tx_flow_ctrl() function.  I think this is happening due to missed 
or missing TX flow control packets.

#0  0x00007fdb8fe4fbf9 in __GI___poll (fds=fds@entry=0x7fdb167fb510, 
nfds=nfds@entry=1, timeout=timeout@entry=100) at 
../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fdb9186de45 in poll (__timeout=100, __nfds=1, __fds=0x7fdb167fb510) 
at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  uhd::transport::wait_for_recv_ready (timeout=0.10000000000000001, 
sock_fd=<optimized out>) at 
/home/aconrad/rfnoc/src/uhd/host/lib/transport/udp_common.hpp:59
#3  udp_zero_copy_asio_mrb::get_new (index=@0x55726266f6e8: 28, 
timeout=<optimized out>, this=<optimized out>)
    at /home/aconrad/rfnoc/src/uhd/host/lib/transport/udp_zero_copy.cpp:79
#4  udp_zero_copy_asio_impl::get_recv_buff (this=0x55726266f670, 
timeout=<optimized out>) at 
/home/aconrad/rfnoc/src/uhd/host/lib/transport/udp_zero_copy.cpp:226
#5  0x00007fdb915d48cc in tx_flow_ctrl (fc_cache=..., async_xport=..., 
endian_conv=0x7fdb915df600 <uhd::ntohx<unsigned int>(unsigned int)>,
    unpack=0x7fdb918b1090 <uhd::transport::vrt::chdr::if_hdr_unpack_be(unsigned 
int const*, uhd::transport::vrt::if_packet_info_t&)>)
    at /home/aconrad/rfnoc/src/uhd/host/lib/usrp/device3/device3_io_impl.cpp:345

The poll() and recvfrom() calls are in the udp_zero_copy_asio_mrb::get_new() 
function in udp_zero_copy.cpp.

Has anyone seen this problem before or have any suggestions on what else to 
look at to further debug this problem?  I have not yet used Wireshark to see 
what's happening on the wire, but I'm planning to do that.  Also note that, if 
I run a single transmit/receive pair (instead of two) I don't see this problem 
and everything works as I expect.

My hardware is an X310 with the XG firmware and dual SBX-120 daughterboards.  
Here are the software versions I'm using, as displayed by the UHD API when the 
application starts.

[00:00:00.000049] Creating the usrp device with: 
addr=192.168.30.2,second_addr=192.168.40.2...
[INFO] [UHD] linux; GNU C++ version 7.3.0; Boost_106501; 
UHD_4.0.0.rfnoc-devel-788-g1f8463cc

The host is a Dell PowerEdge R420 with 24 CPU cores and 24 GB ram.  I think the 
clock speed is a little lower than recommended at 2.7 GHz but thought that I 
could distribute the work load across the various cores to account for that.  
Also, I have followed the instructions to setup dual 10 GigE interfaces for the 
X310 here, 
https://kb.ettus.com/Using_Dual_10_Gigabit_Ethernet_on_the_USRP_X300/X310.

Any help is appreciated.

Thanks,

Al

_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

[USRP-users] Transmit Thread Stuck Receiving Tx Flow Control Packets

Reply via email to