Adding a few virtio people. Please see the virtio spec/flag question below.
On Tue, 12 Nov 2024, Chia-Yu Chang (Nokia) wrote: > >From: Ilpo Järvinen <i...@kernel.org> > >Sent: Thursday, November 7, 2024 8:28 PM > >To: Eric Dumazet <eduma...@google.com> > >Cc: Chia-Yu Chang (Nokia) <chia-yu.ch...@nokia-bell-labs.com>; > >netdev@vger.kernel.org; dsah...@gmail.com; da...@davemloft.net; > >dsah...@kernel.org; pab...@redhat.com; joel.grana...@kernel.org; > >k...@kernel.org; andrew+net...@lunn.ch; ho...@kernel.org; > >pa...@netfilter.org; kad...@netfilter.org; netfilter-de...@vger.kernel.org; > >coret...@netfilter.org; ncardw...@google.com; Koen De Schepper (Nokia) > ><koen.de_schep...@nokia-bell-labs.com>; g.wh...@cablelabs.com; > >ingemar.s.johans...@ericsson.com; mirja.kuehlew...@ericsson.com; > >chesh...@apple.com; rs.i...@gmx.at; jason_living...@comcast.com; > >vidhi_g...@apple.com > >Subject: Re: [PATCH v5 net-next 09/13] gro: prevent ACE field corruption & > >better AccECN handling > > > > > >CAUTION: This is an external email. Please be very careful when clicking > >links or opening attachments. See the URL nok.it/ext for additional > >information. > > > > > > > >On Thu, 7 Nov 2024, Eric Dumazet wrote: > > > >>On Tue, Nov 5, 2024 at 11:07 AM <chia-yu.ch...@nokia-bell-labs.com> wrote: > >>> > >>> From: Ilpo Järvinen <i...@kernel.org> > >>> > >>> There are important differences in how the CWR field behaves in > >>> RFC3168 and AccECN. With AccECN, CWR flag is part of the ACE counter > >>> and its changes are important so adjust the flags changed mask > >>> accordingly. > >>> > >>> Also, if CWR is there, set the Accurate ECN GSO flag to avoid > >>> corrupting CWR flag somewhere. > >>> > >>> Signed-off-by: Ilpo Järvinen <i...@kernel.org> > >>> Signed-off-by: Chia-Yu Chang <chia-yu.ch...@nokia-bell-labs.com> > >>> --- > >>> net/ipv4/tcp_offload.c | 4 ++-- > >>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index > >>> 0b05f30e9e5f..f59762d88c38 100644 > >>> --- a/net/ipv4/tcp_offload.c > >>> +++ b/net/ipv4/tcp_offload.c > >>> @@ -329,7 +329,7 @@ struct sk_buff *tcp_gro_receive(struct list_head > >>> *head, struct sk_buff *skb, > >>> th2 = tcp_hdr(p); > >>> flush = (__force int)(flags & TCP_FLAG_CWR); > >>> flush |= (__force int)((flags ^ tcp_flag_word(th2)) & > >>> - ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH)); > >>> + ~(TCP_FLAG_FIN | TCP_FLAG_PSH)); > >>> flush |= (__force int)(th->ack_seq ^ th2->ack_seq); > >>> for (i = sizeof(*th); i < thlen; i += 4) > >>> flush |= *(u32 *)((u8 *)th + i) ^ @@ -405,7 +405,7 > >>> @@ void tcp_gro_complete(struct sk_buff *skb) > >>> shinfo->gso_segs = NAPI_GRO_CB(skb)->count; > >>> > >>> if (th->cwr) > >>> - shinfo->gso_type |= SKB_GSO_TCP_ECN; > >>> + shinfo->gso_type |= SKB_GSO_TCP_ACCECN; > >>> } > >>> EXPORT_SYMBOL(tcp_gro_complete); > >>> > >> > >> I do not really understand this patch. How a GRO engine can know which > >> ECN variant the peers are using ? > > > >Hi Eric, > > > >Thanks for taking a look. > > > >GRO doesn't know. Setting SKB_GSO_TCP_ECN in case of not knowing can result > >in header change that corrupts ACE field. Thus, GRO has to assume the > >RFC3168 CWR offloading trick cannot be used anymore (unless it tracks the > >connection and knows the bits are used for RFC3168 which is something nobody > >is going to do). > > > >The main point of SKB_GSO_TCP_ACCECN is to prevent SKB_GSO_TCP_ECN or > >NETIF_F_TSO_ECN offloading to be used for the skb as it would corrupt ACE > >field value. > > Hi Eric and Ilpo, > > From my understanding of another email thread (patch 08/13), it seems the > conclusions that SKB_GSO_TCP_ACCECN will still be needed. > > > > >SKB_GSO_TCP_ACCECN doesn't allow CWR bits change within a super-skb but the > >same CWR flag should be repeated for all segments. In a sense, it's simpler > >than RFC3168 offloading. > > > >> SKB_GSO_TCP_ECN is also used from other points, what is your plan ? > >> > >> git grep -n SKB_GSO_TCP_ECN > >> drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:3888: > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN; > >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1291: > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN; > >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1312: > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN; > > > >These looked like they should be just changed to set SKB_GSO_TCP_ACCECN > >instead. > > I agree with these changes and will apply them in the next version. > > > > >I don't anymore recall why I didn't change those when I made this patch long > >time ago, perhaps it was just an oversight or things have changed somehow > >since then. > > > >> include/linux/netdevice.h:5061: BUILD_BUG_ON(SKB_GSO_TCP_ECN != > >> (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT)); > >> include/linux/skbuff.h:664: SKB_GSO_TCP_ECN = 1 << 2, > > > >Not relevant. > > > >> include/linux/virtio_net.h:88: gso_type |= > >> SKB_GSO_TCP_ECN; > >> include/linux/virtio_net.h:161: switch (gso_type & > >> ~SKB_GSO_TCP_ECN) { > >> include/linux/virtio_net.h:226: if (sinfo->gso_type & > >> SKB_GSO_TCP_ECN) > > > >These need to be looked further what's going on as UAPI is also > > involved here. > > I had a look at these parts, and only the 1st and 3rd ones are relevant. > Related to the 1st one, I propose to change > from > > if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) > gso_type |= SKB_GSO_TCP_ECN; > > to > > if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) > gso_type |= SKB_GSO_TCP_ACCECN; > > The reason behind this proposed change is similar as the above changes > in en_rx.c. But en_rx.c is based one CWR flag, there's no similar check here. > For the 3rd one, I suggest to change from > > if (sinfo->gso_type & SKB_GSO_TCP_ECN) > hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN; > > to > > if (sinfo->gso_type & (SKB_GSO_TCP_ECN | SKB_GSO_TCP_ACCECN)) > hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN; > > This proposed change is because VIRTIO_NET_HDR_GSO_ECN must be set to > allow TCP packets requiring segmentation offload which have ECN bit set. > So, no matter whether skb gso_type have GSO_TCP_ECN flag or > GSO_TCP_ACCECN flag, the corresponding hdr gso_type shall be set. > > But, I wonder what would the driver do when with VIRTIO_NET_HDR_GSO_ECN > flag. Will it corrupts CWR or not? I'm starting to heavily question what is the meaning of that VIRTIO_NET_HDR_GSO_ECN flag and if it's even consistent... https://github.com/qemu/qemu/blob/master/net/eth.c That sets VIRTIO_NET_HDR_GSO_ECN based on CE?? (Whereas kernel associates the related SKB_GSO_TCP_ECN to CWR offloading.) The virtio spec is way too vague to be useful so it would not be surprising if there are diverging interpretations from implementers: "If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the VIRTIO_NET_HDR_GSO_ECN bit in gso_type indicates that the TCP packet has the ECN bit set." What is "the ECN bit" in terms used by TCP (or IP)? Could some virtio folks please explain? -- i.