On Tue, Mar 22, 2016 at 3:09 PM, Tom Herbert <t...@herbertland.com> wrote: > On Tue, Mar 22, 2016 at 2:45 PM, Alexander Duyck > <alexander.du...@gmail.com> wrote: >> On Tue, Mar 22, 2016 at 2:10 PM, Tom Herbert <t...@herbertland.com> wrote: >>> On Tue, Mar 22, 2016 at 1:20 PM, Alexander Duyck >>> <alexander.du...@gmail.com> wrote: >>>> On Tue, Mar 22, 2016 at 12:19 PM, Tom Herbert <t...@herbertland.com> wrote: >>>>> In skb_segment the check of whether or not to perform the checksum on >>>>> host was changed to not consider rather remote checksum offload is >>>>> in use. In the case that can_checksum_protocol fails the checksum >>>>> is computed regardless. __skb_udp_tunnel_segment was modified in a >>>>> related patch to add NETIF_F_HW_CSUM into features when grabbing >>>>> the enc_features and remote checksum offload is being done. The >>>>> problem is that this bit can be cleared in lower GSO layers that >>>>> are also doing tunneling (e.g. ipip, GRE when used with GUE), >>>>> so when we get to skb_segment that intent has been lost and >>>>> can_checksum_protocol fails. >>>> >>>> So what you are describing sounds like a tunnel in tunnel scenario. >>>> It might work better to just skip masking the features if >>>> skb->remcsum_offload is set rather than trying to change how we >>>> perform the offload. >>>> >>> To be clear, my patch is restoring the old behavior not implementing a new >>> one. >> >> Yes, but the old behavior could lead to kernel panics under certain >> circumstances. >> >>>> I'm pretty sure this will cause data corruption and maybe a kernel >>>> panic if Tx checksum offload is disabled. >>>> >>> Nope, working fine for me. >> >> What are the options you used to create the tunnel? Did you enable >> both remcsum and udpcsum? >> > modprobe fou > ./ip fou add port 6080 gue > ./ip link add name tun1 type ipip remote 10.1.1.2 local 10.1.1.2 ttl > 225 encap gue encap-sport auto encap-dport 6080 encap-csum > encap-remcsum > ifconfig tun1 192.168.1.1 > ip route add 192.168.1.0/24 dev tun1 > Here are the command I use for VXLAN for reference:
./ip link add vxlan0 type vxlan id 10 group 224.10.10.10 ttl 10 dev eth0 udpcsum remcsumtx remcsumrx ifconfig vxlan0 192.168.111.1 ip route add 192.168.111.0/24 dev vxlan0 When remcsum offload is working properly (enabled and outer checksum is offloaded) then csum_partial gets very few cycles as shown by perf. Tom > Thanks, > Tom > >>> The outer checksum would be computed in >>>>> This patch: >>>>> >>>>> - Restores the check in skb_segment to look at remote checksum offload. >>>>> - Removes the code in __skb_udp_tunnel_segment to force the >>>>> NETIF_F_HW_CSUM feature since this is no longer useful with above >>>>> change. >>>>> - Removes check for remote checksum offload in gso_reset_checksum. >>>>> A special case should not be needed here. >>>>> >>>>> Tested: Single netperf STREAM over GUE-ipip >>>>> >>>>> Before fix: >>>>> 5625 Mbps >>>>> After fix: >>>>> 6410 Mbps >>>>> >>>>> Fixes: 76443456227097179c1482 ("net: Move GSO csum into SKB_GSO_CB") >>>>> Signed-off-by: Tom Herbert <t...@herbertland.com> >>>>> --- >>>>> include/linux/skbuff.h | 4 ---- >>>>> net/core/skbuff.c | 5 ++--- >>>>> net/ipv4/udp_offload.c | 10 ---------- >>>>> 3 files changed, 2 insertions(+), 17 deletions(-) >>>>> >>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h >>>>> index 15d0df9..f6fe8a5 100644 >>>>> --- a/include/linux/skbuff.h >>>>> +++ b/include/linux/skbuff.h >>>>> @@ -3615,10 +3615,6 @@ static inline int gso_pskb_expand_head(struct >>>>> sk_buff *skb, int extra) >>>>> >>>>> static inline void gso_reset_checksum(struct sk_buff *skb, __wsum res) >>>>> { >>>>> - /* Do not update partial checksums if remote checksum is enabled. >>>>> */ >>>>> - if (skb->remcsum_offload) >>>>> - return; >>>>> - >>>>> SKB_GSO_CB(skb)->csum = res; >>>>> SKB_GSO_CB(skb)->csum_start = skb_checksum_start(skb) - skb->head; >>>>> } >>>> >>>> I'm pretty sure this part here will break things when you don't have >>>> an outer offload enabled. What NIC did you test this on? Did you try >>>> disabling the Tx checksum support in the hardware to see what would >>>> happen? >>>> >>> Mellanox. >>> >>> When TX checksum is disabled the outer checksum is computed in >>> __skb_udp_tunnel_segment. >> >> Yes, but that uses gso_make_checksum which requires the csum and >> csum_start fields be initialized in the SKB_GSO_CB(). >> >>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>>>> index f044f97..e4eb78d 100644 >>>>> --- a/net/core/skbuff.c >>>>> +++ b/net/core/skbuff.c >>>>> @@ -3259,14 +3259,13 @@ skip_fraglist: >>>>> nskb->truesize += nskb->data_len; >>>>> >>>>> perform_csum_check: >>>>> - if (!csum) { >>>>> + if (!csum && !nskb->remcsum_offload) { >>>>> if (skb_has_shared_frag(nskb)) { >>>>> err = __skb_linearize(nskb); >>>>> if (err) >>>>> goto err; >>>>> } >>>>> - if (!nskb->remcsum_offload) >>>>> - nskb->ip_summed = CHECKSUM_NONE; >>>>> + nskb->ip_summed = CHECKSUM_NONE; >>>>> SKB_GSO_CB(nskb)->csum = >>>>> skb_checksum(nskb, doffset, >>>>> nskb->len - doffset, 0); >>>> >>>> I'm pretty sure this is going to cause a huge mess if you are >>>> requesting remote checksum but cannot perform an outer checksum. One >>>> of the reasons I merged these features together the way I did was >>>> because we needed to perform the initial checksum to avoid causing a >>>> kernel panic later on. Otherwise we don't have the SKB_GSO_CB()->csum >>>> and SKB_GSO_CB()->csum_start fields populated. >>>> >>> Nope, if we can't perform outer checksum it is done in the host. I'm >>> not seeing any problem. >> >> Are you running remcsum with udpcsum set or is udpcsum not set? If >> you don't set udpcsum it doesn't actually compute an outer checksum >> value. >> >>>>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c >>>>> index 8a3405a..f86f1e1 100644 >>>>> --- a/net/ipv4/udp_offload.c >>>>> +++ b/net/ipv4/udp_offload.c >>>>> @@ -78,16 +78,6 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct >>>>> sk_buff *skb, >>>>> >>>>> features &= skb->dev->hw_enc_features; >>>>> >>>>> - /* The only checksum offload we care about from here on out is the >>>>> - * outer one so strip the existing checksum feature flags and >>>>> - * instead set the flag based on our outer checksum offload value. >>>>> - */ >>>>> - if (remcsum || ufo) { >>>>> - features &= ~NETIF_F_CSUM_MASK; >>>>> - if (!need_csum || offload_csum) >>>>> - features |= NETIF_F_HW_CSUM; >>>>> - } >>>>> - >>>>> /* segment inner packet. */ >>>>> segs = gso_inner_segment(skb, features); >>>>> if (IS_ERR_OR_NULL(segs)) { >>>> >>>> This part breaks UDP fragmentation I am pretty sure. We need this bit >>>> for UFO to avoid having to perform a checksum over the data twice if >>>> we are offloading the outer UDP checksum. >>> >>> The maybe leave this for ufo if it is a problem (please test your >>> suppositions!), but this is not good for remcsum offload. >> >> I suspect we are supposed to be setting encap_level to 0 when we set >> skb->encapsulation to 0 before processing inner headers in the case of >> UDP and GRE tunnel offloads. I'll try to test and submit something in >> the next hour or two. >> >> - Alex