On 05.04.2019 07:48, Rafał Miłecki wrote:
On 05.04.2019 06:26, Toshiaki Makita wrote:
My test results:
Receiving packets from eth0.10, forwarding them to eth0.20 and applying
MASQUERADE on eth0.20, using i40e 25G NIC on kernel 4.20.13.
Disabled rxvlan by ethtool -K to exercise vlan_gro_receive().
Measured TCP throughput by netperf.
GRO on : 17 Gbps
GRO off: 5 Gbps
So I failed to reproduce your problem.
:( Thanks for trying & checking that!
Would you check the CPU usage by "mpstat -P ALL" or similar (like "sar
-u ALL -P ALL") to check if the traffic is able to consume 100% CPU on
your machine?
1) ethtool -K eth0 gro on + iperf running (577 Mb/s)
root@OpenWrt:/# mpstat -P ALL 10 3
Linux 5.1.0-rc3+ (OpenWrt) 03/27/19 _armv7l_ (2 CPU)
16:33:40 CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %idle
16:33:50 all 0.00 0.00 0.00 0.00 0.00 58.79 0.00
0.00 41.21
16:33:50 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00
0.00 0.00
16:33:50 1 0.00 0.00 0.00 0.00 0.00 17.58 0.00
0.00 82.42
16:33:50 CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %idle
16:34:00 all 0.00 0.00 0.05 0.00 0.00 59.44 0.00
0.00 40.51
16:34:00 0 0.00 0.00 0.10 0.00 0.00 99.90 0.00
0.00 0.00
16:34:00 1 0.00 0.00 0.00 0.00 0.00 18.98 0.00
0.00 81.02
16:34:00 CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %idle
16:34:10 all 0.00 0.00 0.00 0.00 0.00 59.59 0.00
0.00 40.41
16:34:10 0 0.00 0.00 0.00 0.00 0.00 100.00 0.00
0.00 0.00
16:34:10 1 0.00 0.00 0.00 0.00 0.00 19.18 0.00
0.00 80.82
Average: CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %idle
Average: all 0.00 0.00 0.02 0.00 0.00 59.27 0.00
0.00 40.71
Average: 0 0.00 0.00 0.03 0.00 0.00 99.97 0.00
0.00 0.00
Average: 1 0.00 0.00 0.00 0.00 0.00 18.58 0.00
0.00 81.42
2) ethtool -K eth0 gro off + iperf running (941 Mb/s)
root@OpenWrt:/# mpstat -P ALL 10 3
Linux 5.1.0-rc3+ (OpenWrt) 03/27/19 _armv7l_ (2 CPU)
16:34:39 CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %idle
16:34:49 all 0.00 0.00 0.05 0.00 0.00 86.91 0.00
0.00 13.04
16:34:49 0 0.00 0.00 0.10 0.00 0.00 78.22 0.00
0.00 21.68
16:34:49 1 0.00 0.00 0.00 0.00 0.00 95.60 0.00
0.00 4.40
16:34:49 CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %idle
16:34:59 all 0.00 0.00 0.10 0.00 0.00 87.06 0.00
0.00 12.84
16:34:59 0 0.00 0.00 0.20 0.00 0.00 79.72 0.00
0.00 20.08
16:34:59 1 0.00 0.00 0.00 0.00 0.00 94.41 0.00
0.00 5.59
16:34:59 CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %idle
16:35:09 all 0.00 0.00 0.05 0.00 0.00 85.71 0.00
0.00 14.24
16:35:09 0 0.00 0.00 0.10 0.00 0.00 79.42 0.00
0.00 20.48
16:35:09 1 0.00 0.00 0.00 0.00 0.00 92.01 0.00
0.00 7.99
Average: CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %idle
Average: all 0.00 0.00 0.07 0.00 0.00 86.56 0.00
0.00 13.37
Average: 0 0.00 0.00 0.13 0.00 0.00 79.12 0.00
0.00 20.75
Average: 1 0.00 0.00 0.00 0.00 0.00 94.01 0.00
0.00 5.99
3) System idle (no iperf)
root@OpenWrt:/# mpstat -P ALL 10 1
Linux 5.1.0-rc3+ (OpenWrt) 03/27/19 _armv7l_ (2 CPU)
16:35:31 CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %idle
16:35:41 all 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 100.00
16:35:41 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 100.00
16:35:41 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 100.00
Average: CPU %usr %nice %sys %iowait %irq %soft %steal
%guest %idle
Average: all 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 100.00
Average: 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 100.00
Average: 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 100.00
If CPU is 100%, perf may help us analyze your problem. If it's
available, try running below while testing:
# perf record -a -g -- sleep 5
And then run this after testing:
# perf report --no-child
I can see my CPU 0 is fully loaded when using "gro on". I'll try perf now.
I guess its GRO + csum_partial() to be blamed for this performance drop.
Maybe csum_partial() is very fast on your powerful machine and few extra calls
don't make a difference? I can imagine it affecting much slower home router with
ARM cores.
1) ethtool -K eth0 gro on
Samples: 34K of event 'cycles', Event count (approx.): 10041345370
Overhead Command Shared Object Symbol
+ 25,46% ksoftirqd/0 [kernel.kallsyms] [k] csum_partial
+ 8,82% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_inv_range
+ 6,03% swapper [kernel.kallsyms] [k] arch_cpu_idle
+ 4,08% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_clean_range
+ 3,82% ksoftirqd/0 [kernel.kallsyms] [k] l2c210_inv_range
+ 3,14% swapper [kernel.kallsyms] [k] rcu_idle_exit
+ 3,00% ksoftirqd/0 [kernel.kallsyms] [k] l2c210_clean_range
+ 2,43% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_start_xmit
+ 1,24% swapper [kernel.kallsyms] [k] csum_partial
+ 1,20% swapper [kernel.kallsyms] [k] do_idle
+ 1,19% swapper [kernel.kallsyms] [k] skb_segment
+ 1,19% ksoftirqd/0 [kernel.kallsyms] [k] arm_dma_unmap_page
+ 1,00% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_poll
+ 0,95% ksoftirqd/0 [kernel.kallsyms] [k] __slab_free.constprop.3
+ 0,80% ksoftirqd/0 [kernel.kallsyms] [k] skb_release_data
+ 0,77% swapper [kernel.kallsyms] [k] __dev_queue_xmit
+ 0,73% ksoftirqd/0 [kernel.kallsyms] [k] build_skb
+ 0,68% ksoftirqd/0 [kernel.kallsyms] [k] skb_segment
+ 0,66% ksoftirqd/0 [kernel.kallsyms] [k] mmiocpy
+ 0,66% ksoftirqd/0 [kernel.kallsyms] [k] skb_checksum_help
+ 0,65% ksoftirqd/0 [kernel.kallsyms] [k] dev_gro_receive
+ 0,64% ksoftirqd/0 [kernel.kallsyms] [k] page_address
+ 0,62% ksoftirqd/0 [kernel.kallsyms] [k] __qdisc_run
+ 0,62% ksoftirqd/0 [kernel.kallsyms] [k] dma_cache_maint_page
+ 0,59% swapper [kernel.kallsyms] [k] __kmalloc_track_caller
+ 0,59% swapper [kernel.kallsyms] [k] mmiocpy
+ 0,58% ksoftirqd/0 [kernel.kallsyms] [k] sch_direct_xmit
+ 0,55% ksoftirqd/0 [kernel.kallsyms] [k] mmioset
+ 0,52% ksoftirqd/0 [kernel.kallsyms] [k] inet_gro_receive
0,49% ksoftirqd/0 [kernel.kallsyms] [k] netdev_alloc_frag
0,47% swapper [kernel.kallsyms] [k]
__netif_receive_skb_core
0,45% swapper [kernel.kallsyms] [k] kmem_cache_alloc
0,45% ksoftirqd/0 [kernel.kallsyms] [k] __skb_checksum
0,43% swapper [kernel.kallsyms] [k] v7_dma_clean_range
0,39% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_alloc
0,36% ksoftirqd/0 [kernel.kallsyms] [k] qdisc_dequeue_head
0,36% ksoftirqd/0 [kernel.kallsyms] [k] arm_dma_map_page
0,35% swapper [kernel.kallsyms] [k] mmioset
0,34% ksoftirqd/0 [kernel.kallsyms] [k] tcp_gro_receive
0,33% swapper [kernel.kallsyms] [k] __copy_skb_header
0,33% ksoftirqd/0 [kernel.kallsyms] [k] kmem_cache_free
0,32% ksoftirqd/0 [kernel.kallsyms] [k] netif_skb_features
0,30% swapper [kernel.kallsyms] [k] netif_skb_features
0,30% ksoftirqd/0 [kernel.kallsyms] [k] __skb_flow_dissect
2) ethtool -K eth0 gro off
Samples: 39K of event 'cycles', Event count (approx.): 13065826851
Overhead Command Shared Object Symbol
+ 11,09% swapper [kernel.kallsyms] [k] v7_dma_inv_range
+ 5,86% ksoftirqd/1 [kernel.kallsyms] [k] v7_dma_clean_range
+ 5,77% swapper [kernel.kallsyms] [k] l2c210_inv_range
+ 5,38% swapper [kernel.kallsyms] [k] __irqentry_text_end
+ 4,44% swapper [kernel.kallsyms] [k] bcma_host_soc_read32
+ 3,28% ksoftirqd/1 [kernel.kallsyms] [k]
__netif_receive_skb_core
+ 3,25% ksoftirqd/1 [kernel.kallsyms] [k] l2c210_clean_range
+ 2,70% swapper [kernel.kallsyms] [k] arch_cpu_idle
+ 2,25% swapper [kernel.kallsyms] [k] bgmac_poll
+ 2,14% ksoftirqd/1 [kernel.kallsyms] [k] bgmac_start_xmit
+ 1,79% ksoftirqd/1 [kernel.kallsyms] [k] __dev_queue_xmit
+ 1,36% ksoftirqd/1 [kernel.kallsyms] [k] skb_vlan_untag
+ 1,11% swapper [kernel.kallsyms] [k] __skb_flow_dissect
+ 1,07% ksoftirqd/1 [kernel.kallsyms] [k] netif_skb_features
+ 0,98% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv_core.constprop.3
+ 0,92% ksoftirqd/1 [kernel.kallsyms] [k] sch_direct_xmit
+ 0,90% ksoftirqd/1 [kernel.kallsyms] [k] __local_bh_enable_ip
+ 0,86% ksoftirqd/1 [kernel.kallsyms] [k] nf_hook_slow
+ 0,82% swapper [kernel.kallsyms] [k] net_rx_action
+ 0,80% ksoftirqd/1 [kernel.kallsyms] [k]
validate_xmit_skb.constprop.30
+ 0,75% swapper [kernel.kallsyms] [k] build_skb
+ 0,72% ksoftirqd/1 [kernel.kallsyms] [k] ip_forward
+ 0,71% ksoftirqd/1 [kernel.kallsyms] [k] br_handle_frame_finish
+ 0,71% ksoftirqd/1 [kernel.kallsyms] [k] skb_pull_rcsum
+ 0,65% swapper [kernel.kallsyms] [k] arm_dma_unmap_page
+ 0,59% ksoftirqd/1 [kernel.kallsyms] [k] ip_finish_output2
+ 0,59% swapper [kernel.kallsyms] [k] __skb_get_hash
+ 0,58% swapper [kernel.kallsyms] [k] dma_cache_maint_page
+ 0,55% ksoftirqd/1 [kernel.kallsyms] [k] fdb_find_rcu
+ 0,54% swapper [kernel.kallsyms] [k] bcma_host_soc_write32
+ 0,53% ksoftirqd/1 [kernel.kallsyms] [k] vlan_do_receive
+ 0,52% ksoftirqd/1 [kernel.kallsyms] [k] memmove
+ 0,52% swapper [kernel.kallsyms] [k] rcu_idle_exit
+ 0,51% ksoftirqd/1 [kernel.kallsyms] [k] ip_rcv
+ 0,51% ksoftirqd/1 [kernel.kallsyms] [k] dev_hard_start_xmit
0,49% ksoftirqd/1 [kernel.kallsyms] [k] ip_output
0,46% ksoftirqd/1 [kernel.kallsyms] [k]
vlan_dev_hard_start_xmit
0,45% swapper [kernel.kallsyms] [k] enqueue_to_backlog
0,42% swapper [kernel.kallsyms] [k] netdev_alloc_frag
0,42% swapper [kernel.kallsyms] [k] skb_release_data
0,41% ksoftirqd/1 [kernel.kallsyms] [k] ip_forward_finish
0,40% ksoftirqd/1 [kernel.kallsyms] [k] br_handle_frame
0,37% ksoftirqd/1 [kernel.kallsyms] [k] mmiocpy
0,37% ksoftirqd/1 [kernel.kallsyms] [k] page_address
0,36% ksoftirqd/0 [kernel.kallsyms] [k] v7_dma_inv_range
0,36% ksoftirqd/1 [kernel.kallsyms] [k] memcmp
0,36% ksoftirqd/1 [kernel.kallsyms] [k]
netif_receive_skb_internal
0,34% swapper [kernel.kallsyms] [k] page_address
0,34% swapper [kernel.kallsyms] [k] mmioset
0,33% ksoftirqd/1 [kernel.kallsyms] [k] br_pass_frame_up
0,33% ksoftirqd/1 [kernel.kallsyms] [k] neigh_connected_output
0,33% swapper [kernel.kallsyms] [k] kmem_cache_alloc
0,31% ksoftirqd/1 [kernel.kallsyms] [k] mmioset
0,30% ksoftirqd/1 [kernel.kallsyms] [k] ip_finish_output
0,30% ksoftirqd/1 [kernel.kallsyms] [k] bcma_bgmac_write