Re: [USRP-users] X300 recovery after LATE_COMMAND or OVERFLOW_ERROR

Michael West via USRP-users Tue, 25 Jul 2017 13:15:06 -0700

Hi Martin,

Without seeing some source code, it is difficult to say exactly what is
going on.  The ERROR_CODE_LATE_COMMAND means that the radio block in the
FPGA received a command late.  An ERROR_CODE_OVERFLOW with the
out_of_sequence flag set is alarming and means that packets are getting
dropped somewhere.  Make sure you have the number of RX buffers set to max
(i.e. sudo ethtool <interface> -G rx 4096).  An ERROR_CODE_OVERFLOW without
the out_of_sequence flag set most likely means there is too much of a delay
before a recv() call somewhere in your code.  The "failure to align" error
is the UHD code attempting and failing to re-align the data streams.  When
an overflow occurs, UHD will stop the streams and re-issue stream commands
to try to realign the data.  Data is discarded until packets with the same
timestamp on all streams are seen.


It sounds to me like occasionally the application is falling behind and the
recovery attempts are having issues due to dropped packets.

We did some testing with the X710 and found that it is possible to overrun
the socket buffer because the driver for the X710 seems to use up the full
MTU size in the socket buffer regardless of the actual packet size.  This
potentially leads to dropped packets.  You can try increasing the socket
buffer size using the recv_buff_size parameter (i.e. set to 50 MB) in the
device arguments, but that will not make a difference if there is a delay
before the recv() call somewhere in the application that allows the socket
buffer to fill up.

Regards,
Michael

On Tue, Jul 25, 2017 at 12:22 PM, Martin Guski via USRP-users <
usrp-users@lists.ettus.com> wrote:

> Hi!
>
> We are using 8 X300s which are each connected to a computer via dedicated
> 10 GigE ports. Our application is transmitting and receiving for bursts of
> 1 second (@10 Msps) on both slots of the USRP. After that we process the
> data, transmit/receive again, and so on.
>
> After running without errors for some time (a few hour), the rx_streamer
> returns errors for the next four bursts in the following pattern:
>
> 1) ERROR_CODE_LATE_COMMAND
> 2) ERROR_CODE_LATE_COMMAND
> 3) ERROR_CODE_OVERFLOW
> 4) ERROR_CODE_TIMEOUT
>
> And after that it usually continues running without problems for about 20
> - 60 min before it happens again. All 8 USRPs report the errors at the same
> time. (Each USRP is controlled by a separate multi_usrp running in a
> dedicated process..)
> I took a closer look when I start the streaming: The first LATE_COMMAND is
> really to late (time of preparation of the stream varies  ). The following
> command if definitely not to late, but nevertheless the LATE_ERROR is
> raised.
>
> Sometimes one USRP doesn't recover after returning the ERROR_CODE_TIMEOUT
> error and reports LATE_COMMAND and OVERFLOW_ERRORS (with and without
> out_of_sequence flag) for all following transmissions. After restarting the
> usrp program everything works again.
>
> And from time to time I also get this error:
>
>> UHD Error:
>>     The receive packet handler failed to time-align packets.
>>     1002 received packets were processed by the handler.
>>     However, a timestamp match could not be determined.
>
>
> So my question is: Is there a way to recover the USRP or the steamer after
> I detect an extended sequence of LATE_COMMAND and OVERFLOW_ERRORS?
>
>
> Thanks
> Martin
>
> *Maybe some interesting further information:*
> - The error only occurs when using both slots of the USRP, for one side
> everything works.
> - Reducing the sampling rate to 5 Msps doesn't help
> - The ERROR_CODE_TIMEOUT is new for the new UHD3.9.7 release and it looks
> like this resets the USRP sometimes.
> - For the older versions of UHD 3.9 it never recovered form the first
> occurring error.
> - For UHD3.10 (maint branch) after ( I guess) the first error occurred the
> driver process had a CPU usage of 200 % until the process was killed. Also
> there were underflows for nearly each transmission (when using both
> sides/channels and 10 Msps).
>
>
> *More information about our setup:*
>
> Ubuntu 16.04.1 LTS
> UHD_003.009.007 release
>
> X300
> - Hardware Versions 5, 6
> Frontends: 2x LFRX and 2x LFTX2x
>
> Intel i7 (i7-5930K @3.50GHz, 6 Cores / 12 threads), 12 GB Ram
> - disabled CPU power management
>
> Network: Intel X710 for 10GbE SFP+ (quad-port)
> Each X300 (port1) connected directly to NIC port, all have separate
> netmasks
> - MTU set to 9000
> - increase the maximum size of the socket buffers
> - Flow Control disabled for rx and tx
> - ifconfig shows 0 errors, dropped packets or overruns
>
>
> _______________________________________________
> USRP-users mailing list
> USRP-users@lists.ettus.com
> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>
>

_______________________________________________
USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Re: [USRP-users] X300 recovery after LATE_COMMAND or OVERFLOW_ERROR

Reply via email to