I've already tested network throughput with 'udpblaster' tool and was
happily wondering by NuttX capability to flood full Ethernet bandwidth in
both directions simultaneously. That was one of the reasons why we've
chosen NuttX for our project. "If it's so strong in flooding, it should
have low latency" - we thought.. but not, real latency seems to be not much
great.
In a throughput test like updblaster, the latency that you are concerned with now is hidden due to the overlap in sending queued data packets.  The next packet is always in place and ready to be sent when the previous packet is sent.
I suppose the delay hides somewhere in the Ethernet driver because packets
from RAW sockets suffer as well as UDP packets. But I need some glue to
help me choose the right direction for further digging..

There are two threads involved:

1. The user thread that calls the UDP sendto() interface:

 * Lock the network.
 * Call netdev_txnotify_dev() to inform the driver that TX data is
   available.  The driver should schedule the TX poll on LP work queue.
 * If CONFIG_NET_UDP_WRITE_BUFFERS is enabled, the UDP sendto() to will
   copy the UDP packet into a write buffer, unlock the network, and
   return to the caller immediately.
 * if CONFIG_NET_UDP_WRITE_BUFFERS is NOT enabled, the UDP sendto()
   will unlock the network and wait for the driver TX poll

The other thread is the LP work queue thread.  Work was schedule here when netdev_txnotify_dev() was called.

 * Lock the network (perhaps waiting for the user thread to unlock it).
 * Perform the TX poll
 * If CONFIG_NET_UDP_WRITE_BUFFERS is enabled, it will copy the
   buffered UDP packet into the driver packet buffer.
 * If CONFIG_NET_UDP_WRITE_BUFFERS is NOT enabled, it will copy the
   user data directly into the driver packet buffer.
 * When the packet buffer is filled, the Ethernet driver will send (or
   schedule to send) the packet

For single packet transfers, I would think that the latency would be a little less if CONFIG_NET_UDP_WRITE_BUFFERS were disabled. That would save one packet copy with the side effect of making the user application wait until the data is accepted by the driver.

LP worker thread priority could have some effect in certain situations.


The

Reply via email to