On 07/12/16 07:57, Paolo Abeni wrote: > We have some experimental patches to implement GRO for plain UDP > connected sockets, using frag_list to preserve the individual skb len, > and deliver the packet to user space individually. With that I got > ~3mpps with a single queue/user space sink - before the recent udp > improvements. You might want to benchmark these against my batched receive patches from a while ago[1], both seem to have broadly the same objective. In my benchmarking (obviously with different hardware) I was using multiple sink processes, but all (processes and irqs) on a single core; the unpatched kernel was getting ~5Mpps. Then with my patches I was getting ~6.4Mpps. (Limitations of my test scripts meant that having a single sink process meant also having a single source process, in which case I was TX limited to ~3Mpps, and using about 60% CPU on the RX side.)
Let me know if you're interested in doing this comparison; if so I'll post updated patches against net-next. My own attempts to benchmark them more have been held up by lack of time and not really knowing what constitutes a realistic netfilter setup. Of course if you're using a device other than sfc you'll need to add your own equivalent of patch #2 to call the netif_receive_skb_list() entry point from the driver. -Ed [1] https://www.spinics.net/lists/netdev/msg373769.html