On 04.04.2019 14:57, Rafał Miłecki wrote:
Long story short, starting with the commit 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan") - which first hit kernel 4.2 - NAT performance of my router dropped by 30% - 40%.
I'll try to provide some summary for this issue. I'll focus on TCP traffic as that's what I happened to test. Basically all slowdowns are related to the csum_partial(). Calculating checksum has a significant impact on NAT performance on less CPU powerful devices. ********** GRO disabled Without GRO a csum_partial() is used only when validating TCP packets in the nf_conntrack_tcp_packet() (known as tcp_packet() in kernels older than 5.1). Simplified forward trace for that case: nf_conntrack_in nf_conntrack_tcp_packet tcp_error if (state->net->ct.sysctl_checksum) nf_checksum nf_ip_checksum __skb_checksum_complete That validation can be disabled using nf_conntrack_checksum sysfs and it bumps NAT speed for me from 666 Mb/s to 940 Mb/s (+41%). ********** GRO enabled First of all GRO also includes TCP validation that requires calculating a checksum. Simplified forward trace for that case: vlan_gro_receive call_gro_receive inet_gro_receive indirect_call_gro_receive tcp4_gro_receive skb_gro_checksum_validate tcp_gro_receive *If* we had a way to disable that validation it *would* result in bumping NAT speed for me from 577 Mb/s to 825 Mb/s (+43%). Secondly using GRO means we need to calculate a checksum before transmitting packets (applies to devices without HW checksum offloading). I think it's related to packets merging in the skb_gro_receive() and then setting CHECKSUM_PARTIAL: vlan_gro_complete inet_gro_complete tcp4_gro_complete tcp_gro_complete skb->ip_summed = CHECKSUM_PARTIAL; That results in bgmac calculating a checksum from the scratch, take a look at the bgmac_dma_tx_add() which does: if (skb->ip_summed == CHECKSUM_PARTIAL) skb_checksum_help(skb); Performing that whole checksum calculation will always result in GRO slowing down NAT for me when using BCM47094 SoC with that not-so-powerful ARM CPUs.