On Fri, 4 Mar 2016 08:36:44 -0800 Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote:
> On Fri, Mar 04, 2016 at 02:01:14PM +0100, Jesper Dangaard Brouer wrote: > > This patchset use the bulk ALLOC side of the kmem_cache bulk APIs, for > > SKB allocations. The bulk free side got enabled in merge commit > > 3134b9f019f2 ("net: mitigating kmem_cache free slowpath"). > > > > The first two patches is a followup on the free-side, which enables > > bulk-free in the drivers mlx4 and mlx5 (dev_kfree_skb -> napi_consume_skb). > > > > Rest of patchset is focused on bulk alloc-side. We start with a > > conservative bulk alloc of 8 SKB, which all drivers using the > > napi_alloc_skb() call will benefit from. Then the API is extended to, > > allow driver hinting on needed SKBs (only some drivers know this > > size), and mlx5 driver is the first user of hinting. > > patches 1-5 look very good to me. Should help all cases afaik. > As far as 6-7 about hints I have a question. Does this hint > actually makes the difference? The fixed bulk alloc of 8 probably > easier for the main slub, but when mlx5 starts doing 'work_done' as > a hint there will be more 'random' bulking going on. > Was wondering whether you have the perf numbers to back up 6/7 Yes, it makes a difference. I did some performance numbers with dropping in the mlx5 driver, plus the RX loop cache-miss avoidance. With all my optimizations I reached 12Mpps, with this hint optimization I could reach 13Mpps. It sounds nice also percentage wise (8.3%), but in nanosec this optimization "only" corresponds to 6.4 ns. For real workloads, we might see a higher "nanosec" improvement, as this invoke kmem_cache_alloc_bulk() less times resulting in less icache-misses. So, yes it makes a difference. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer