On Mon, Apr 04, 2016 at 08:22:03AM -0700, Eric Dumazet wrote: > On Mon, 2016-04-04 at 16:57 +0200, Jesper Dangaard Brouer wrote: > > On Fri, 1 Apr 2016 19:47:12 -0700 Alexei Starovoitov > > <alexei.starovoi...@gmail.com> wrote: > > > > > My guess we're hitting 14.5Mpps limit for empty bpf program > > > and for program that actually looks into the packet because we're > > > hitting 10G phy limit of 40G nic. Since physically 40G nic > > > consists of four 10G phys. There will be the same problem > > > with 100G and 50G nics. Both will be hitting 25G phy limit. > > > We need to vary packets somehow. Hopefully Or can explain that > > > bit of hw design. > > > Jesper's experiments with mlx4 showed the same 14.5Mpps limit > > > when sender blasting the same packet over and over again. > > > > That is an interesting observation Alexei, and could explain the pps limit > > I hit on 40G, with single flow testing. AFAIK 40G is 4x 10G PHYs, and > > 100G is 4x 25G PHYs. > > > > I have a pktgen script that tried to avoid this pitfall. By creating a > > new flow per pktgen kthread. I call it > > "pktgen_sample05_flow_per_thread.sh"[1] > > > > [1] > > https://github.com/netoptimizer/network-testing/blob/master/pktgen/pktgen_sample05_flow_per_thread.sh > > > > A single flow is able to use 40Gbit on those 40Gbit NIC, so there is not > a single 10GB trunk used for a given flow. > > This 14Mpps thing seems to be a queue limitation on mlx4.
yeah, could be queueing related. Multiple cpus can send ~30Mpps of the same 64 byte packet, but mlx4 can only receive 14.5Mpps. Odd. Or (and other mellanox guys), what is really going on inside 40G nic ?