2018-03-26 23:58 GMT+02:00 William Tu <u9012...@gmail.com>: > Hi Jesper, > > Thanks a lot for your prompt reply. > >>> Hi, >>> I also did an evaluation of AF_XDP, however the performance isn't as >>> good as above. >>> I'd like to share the result and see if there are some tuning suggestions. >>> >>> System: >>> 16 core, Intel(R) Xeon(R) CPU E5-2440 v2 @ 1.90GHz >>> Intel 10G X540-AT2 ---> so I can only run XDP_SKB mode >> >> Hmmm, why is X540-AT2 not able to use XDP natively? > > Because I'm only able to use ixgbe driver for this NIC, > and AF_XDP patch only has i40e support? >
It's only i40e that support zero copy. As for native XDP support, only XDP_REDIRECT support is required and ixgbe does support XDP_REDIRECT -- unfortunately, ixgbe still needs a needs a patch to work properly, which is in net-next: ed93a3987128 ("ixgbe: tweak page counting for XDP_REDIRECT"). >> >>> AF_XDP performance: >>> Benchmark XDP_SKB >>> rxdrop 1.27 Mpps >>> txpush 0.99 Mpps >>> l2fwd 0.85 Mpps >> >> Definitely too low... >> > I did another run, the rxdrop seems better. > Benchmark XDP_SKB > rxdrop 2.3 Mpps > txpush 1.05 Mpps > l2fwd 0.90 Mpps > >> What is the performance if you drop packets via iptables? >> >> Command: >> $ iptables -t raw -I PREROUTING -p udp --dport 9 --j DROP >> > I did > # iptables -t raw -I PREROUTING -p udp -i enp10s0f0 -j DROP > # iptables -nvL -t raw; sleep 10; iptables -nvL -t raw > > and I got 2.9Mpps. > >>> NIC configuration: >>> the command >>> "ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 action 16" >>> doesn't work on my ixgbe driver, so I use ntuple: >>> >>> ethtool -K enp10s0f0 ntuple on >>> ethtool -U enp10s0f0 flow-type udp4 src-ip 10.1.1.100 action 1 >>> then >>> echo 1 > /proc/sys/net/core/bpf_jit_enable >>> ./xdpsock -i enp10s0f0 -r -S --queue=1 >>> >>> I also take a look at perf result: >>> For rxdrop: >>> 86.56% xdpsock xdpsock [.] main >>> 9.22% xdpsock [kernel.vmlinux] [k] nmi >>> 4.23% xdpsock xdpsock [.] xq_enq >> >> It looks very strange that you see non-maskable interrupt's (NMI) being >> this high... >> > yes, that's weird. Looking at the perf annotate of nmi, > it shows 100% spent on nop instruction. > >> >>> For l2fwd: >>> 20.81% xdpsock xdpsock [.] main >>> 10.64% xdpsock [kernel.vmlinux] [k] clflush_cache_range >> >> Oh, clflush_cache_range is being called! > > I though clflush_cache_range is high because we have many smp_rmb, smp_wmb > in the xdpsock queue/ring management userspace code. > (perf shows that 75% of this 10.64% spent on mfence instruction.) > >> Do your system use an IOMMU ? >> > Yes. > With CONFIG_INTEL_IOMMU=y > and I saw some related functions called (ex: intel_alloc_iova). > >>> 8.46% xdpsock [kernel.vmlinux] [k] xsk_sendmsg >>> 6.72% xdpsock [kernel.vmlinux] [k] skb_set_owner_w >>> 5.89% xdpsock [kernel.vmlinux] [k] __domain_mapping >>> 5.74% xdpsock [kernel.vmlinux] [k] alloc_skb_with_frags >>> 4.62% xdpsock [kernel.vmlinux] [k] netif_skb_features >>> 3.96% xdpsock [kernel.vmlinux] [k] ___slab_alloc >>> 3.18% xdpsock [kernel.vmlinux] [k] nmi >> >> Again high count for NMI ?!? >> >> Maybe you just forgot to tell perf that you want it to decode the >> bpf_prog correctly? >> >> https://prototype-kernel.readthedocs.io/en/latest/bpf/troubleshooting.html#perf-tool-symbols >> >> Enable via: >> $ sysctl net/core/bpf_jit_kallsyms=1 >> >> And use perf report (while BPF is STILL LOADED): >> >> $ perf report --kallsyms=/proc/kallsyms >> >> E.g. for emailing this you can use this command: >> >> $ perf report --sort cpu,comm,dso,symbol --kallsyms=/proc/kallsyms >> --no-children --stdio -g none | head -n 40 >> > > Thanks, I followed the steps, the result of l2fwd > # Total Lost Samples: 119 > # > # Samples: 2K of event 'cycles:ppp' > # Event count (approx.): 25675705627 > # > # Overhead CPU Command Shared Object Symbol > # ........ ... ....... .................. > .................................. > # > 10.48% 013 xdpsock xdpsock [.] main > 9.77% 013 xdpsock [kernel.vmlinux] [k] clflush_cache_range > 8.45% 013 xdpsock [kernel.vmlinux] [k] nmi > 8.07% 013 xdpsock [kernel.vmlinux] [k] xsk_sendmsg > 7.81% 013 xdpsock [kernel.vmlinux] [k] __domain_mapping > 4.95% 013 xdpsock [kernel.vmlinux] [k] ixgbe_xmit_frame_ring > 4.66% 013 xdpsock [kernel.vmlinux] [k] skb_store_bits > 4.39% 013 xdpsock [kernel.vmlinux] [k] syscall_return_via_sysret > 3.93% 013 xdpsock [kernel.vmlinux] [k] pfn_to_dma_pte > 2.62% 013 xdpsock [kernel.vmlinux] [k] __intel_map_single > 2.53% 013 xdpsock [kernel.vmlinux] [k] __alloc_skb > 2.36% 013 xdpsock [kernel.vmlinux] [k] iommu_no_mapping > 2.21% 013 xdpsock [kernel.vmlinux] [k] alloc_skb_with_frags > 2.07% 013 xdpsock [kernel.vmlinux] [k] skb_set_owner_w > 1.98% 013 xdpsock [kernel.vmlinux] [k] __kmalloc_node_track_caller > 1.94% 013 xdpsock [kernel.vmlinux] [k] ksize > 1.84% 013 xdpsock [kernel.vmlinux] [k] validate_xmit_skb_list > 1.62% 013 xdpsock [kernel.vmlinux] [k] kmem_cache_alloc_node > 1.48% 013 xdpsock [kernel.vmlinux] [k] __kmalloc_reserve.isra.37 > 1.21% 013 xdpsock xdpsock [.] xq_enq > 1.08% 013 xdpsock [kernel.vmlinux] [k] intel_alloc_iova > > And l2fwd under "perf stat" looks OK to me. There is little context > switches, cpu > is fully utilized, 1.17 insn per cycle seems ok. > > Performance counter stats for 'CPU(s) 6': > 10000.787420 cpu-clock (msec) # 1.000 CPUs > utilized > 24 context-switches # 0.002 K/sec > 0 cpu-migrations # 0.000 K/sec > 0 page-faults # 0.000 K/sec > 22,361,333,647 cycles # 2.236 GHz > 13,458,442,838 stalled-cycles-frontend # 60.19% frontend > cycles idle > 26,251,003,067 instructions # 1.17 insn per > cycle > # 0.51 stalled > cycles per insn > 4,938,921,868 branches # 493.853 M/sec > 7,591,739 branch-misses # 0.15% of all > branches > 10.000835769 seconds time elapsed > > Will continue investigate... > Thanks > William