On Wed, Apr 18, 2018 at 7:17 AM, Paolo Abeni <pab...@redhat.com> wrote:
> On Tue, 2018-04-17 at 16:00 -0400, Willem de Bruijn wrote:
>> From: Willem de Bruijn <will...@google.com>
>>
>> Segmentation offload reduces cycles/byte for large packets by
>> amortizing the cost of protocol stack traversal.
>>
>> This patchset implements GSO for UDP. A process can concatenate and
>> submit multiple datagrams to the same destination in one send call
>> by setting socket option SOL_UDP/UDP_SEGMENT with the segment size,
>> or passing an analogous cmsg at send time.
>>
>> The stack will send the entire large (up to network layer max size)
>> datagram through the protocol layer. At the GSO layer, it is broken
>> up in individual segments. All receive the same network layer header
>> and UDP src and dst port. All but the last segment have the same UDP
>> header, but the last may differ in length and checksum.
>
> This is interesting, thanks for sharing!
>
> I have some local patches somewhere implementing UDP GRO, but I never
> tried to upstream them, since I lacked the associated GSO and I thought
> that the use-case was not too relevant.
>
> Given that your use-case is a connected socket - no per packet route
> lookup - how does GSO performs compared to plain sendmmsg()? Have you
> considered using and/or improving the latter?
>
> When testing with Spectre/Meltdown mitigation in places, I expect that
> the most relevant part of the gain is due to the single syscall per
> burst.

The main benefit is actually not route lookup avoidance. Somewhat to
my surprise. The benchmark can be run both in connected and
unconnected ('-u') mode. Both saturate the cpu cycles, so only showing
throughput:

[connected]     udp tx:    825 MB/s   588336 calls/s  14008 msg/s
[unconnected] udp tx:    711 MB/s   506646 calls/s  12063 msg/s

This corresponds to results previously seen with other applications
of about 15%.

When looking at a perf report, there is no clear hot spot, which
indicates that the savings accrue across the protocol stack traversal.

I just hacked up a sendmmsg extension to the benchmark to verify.
Indeed that does not have nearly the same benefit as GSO:

udp tx:    976 MB/s   695394 calls/s  16557 msg/s

This matches the numbers seen from TCP without TSO and GSO.
That also has few system calls, but observes per MTU stack traversal.

I pushed the branch to my github at

  https://github.com/wdebruij/linux/tree/udpgso-20180418

and also the version I sent for RFC yesterday at

  https://github.com/wdebruij/linux/tree/udpgso-rfc-v1

Reply via email to