>From: Jason Wang <jasow...@redhat.com> >Sent: Wednesday, November 13, 2024 2:43 AM >To: Ilpo Järvinen <i...@kernel.org> >Cc: Chia-Yu Chang (Nokia) <chia-yu.ch...@nokia-bell-labs.com>; Michael S. >Tsirkin <m...@redhat.com>; virtualizat...@lists.linux.dev; Eric Dumazet ><eduma...@google.com>; netdev@vger.kernel.org; dsah...@gmail.com; >da...@davemloft.net; dsah...@kernel.org; pab...@redhat.com; >joel.grana...@kernel.org; k...@kernel.org; andrew+net...@lunn.ch; >ho...@kernel.org; pa...@netfilter.org; kad...@netfilter.org; >netfilter-de...@vger.kernel.org; coret...@netfilter.org; ncardw...@google.com; >Koen De Schepper (Nokia) <koen.de_schep...@nokia-bell-labs.com>; >g.wh...@cablelabs.com; ingemar.s.johans...@ericsson.com; >mirja.kuehlew...@ericsson.com; chesh...@apple.com; rs.i...@gmx.at; >jason_living...@comcast.com; vidhi_g...@apple.com >Subject: Re: [PATCH v5 net-next 09/13] gro: prevent ACE field corruption & >better AccECN handling > >[You don't often get email from jasow...@redhat.com. Learn why this is >important at https://aka.ms/LearnAboutSenderIdentification ] > >CAUTION: This is an external email. Please be very careful when clicking links >or opening attachments. See the URL nok.it/ext for additional information. > > > >On Wed, Nov 13, 2024 at 5:19 AM Ilpo Järvinen <i...@kernel.org> wrote: >> >> Adding a few virtio people. Please see the virtio spec/flag question >> below. >> >> On Tue, 12 Nov 2024, Chia-Yu Chang (Nokia) wrote: >> >> > >From: Ilpo Järvinen <i...@kernel.org> >> > >Sent: Thursday, November 7, 2024 8:28 PM >> > >To: Eric Dumazet <eduma...@google.com> >> > >Cc: Chia-Yu Chang (Nokia) <chia-yu.ch...@nokia-bell-labs.com>; >> > >netdev@vger.kernel.org; dsah...@gmail.com; da...@davemloft.net; >> > >dsah...@kernel.org; pab...@redhat.com; joel.grana...@kernel.org; >> > >k...@kernel.org; andrew+net...@lunn.ch; ho...@kernel.org; >> > >pa...@netfilter.org; kad...@netfilter.org; >> > >netfilter-de...@vger.kernel.org; coret...@netfilter.org; >> > >ncardw...@google.com; Koen De Schepper (Nokia) >> > ><koen.de_schep...@nokia-bell-labs.com>; g.wh...@cablelabs.com; >> > >ingemar.s.johans...@ericsson.com; mirja.kuehlew...@ericsson.com; >> > >chesh...@apple.com; rs.i...@gmx.at; jason_living...@comcast.com; >> > >vidhi_g...@apple.com >> > >Subject: Re: [PATCH v5 net-next 09/13] gro: prevent ACE field >> > >corruption & better AccECN handling >> > > >> > > >> > >CAUTION: This is an external email. Please be very careful when clicking >> > >links or opening attachments. See the URL nok.it/ext for additional >> > >information. >> > > >> > > >> > > >> > >On Thu, 7 Nov 2024, Eric Dumazet wrote: >> > > >> > >>On Tue, Nov 5, 2024 at 11:07 AM <chia-yu.ch...@nokia-bell-labs.com> >> > >>wrote: >> > >>> >> > >>> From: Ilpo Järvinen <i...@kernel.org> >> > >>> >> > >>> There are important differences in how the CWR field behaves in >> > >>> RFC3168 and AccECN. With AccECN, CWR flag is part of the ACE >> > >>> counter and its changes are important so adjust the flags >> > >>> changed mask accordingly. >> > >>> >> > >>> Also, if CWR is there, set the Accurate ECN GSO flag to avoid >> > >>> corrupting CWR flag somewhere. >> > >>> >> > >>> Signed-off-by: Ilpo Järvinen <i...@kernel.org> >> > >>> Signed-off-by: Chia-Yu Chang <chia-yu.ch...@nokia-bell-labs.com> >> > >>> --- >> > >>> net/ipv4/tcp_offload.c | 4 ++-- >> > >>> 1 file changed, 2 insertions(+), 2 deletions(-) >> > >>> >> > >>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c >> > >>> index >> > >>> 0b05f30e9e5f..f59762d88c38 100644 >> > >>> --- a/net/ipv4/tcp_offload.c >> > >>> +++ b/net/ipv4/tcp_offload.c >> > >>> @@ -329,7 +329,7 @@ struct sk_buff *tcp_gro_receive(struct list_head >> > >>> *head, struct sk_buff *skb, >> > >>> th2 = tcp_hdr(p); >> > >>> flush = (__force int)(flags & TCP_FLAG_CWR); >> > >>> flush |= (__force int)((flags ^ tcp_flag_word(th2)) & >> > >>> - ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH)); >> > >>> + ~(TCP_FLAG_FIN | TCP_FLAG_PSH)); >> > >>> flush |= (__force int)(th->ack_seq ^ th2->ack_seq); >> > >>> for (i = sizeof(*th); i < thlen; i += 4) >> > >>> flush |= *(u32 *)((u8 *)th + i) ^ @@ -405,7 >> > >>> +405,7 @@ void tcp_gro_complete(struct sk_buff *skb) >> > >>> shinfo->gso_segs = NAPI_GRO_CB(skb)->count; >> > >>> >> > >>> if (th->cwr) >> > >>> - shinfo->gso_type |= SKB_GSO_TCP_ECN; >> > >>> + shinfo->gso_type |= SKB_GSO_TCP_ACCECN; >> > >>> } >> > >>> EXPORT_SYMBOL(tcp_gro_complete); >> > >>> >> > >> >> > >> I do not really understand this patch. How a GRO engine can know >> > >> which ECN variant the peers are using ? >> > > >> > >Hi Eric, >> > > >> > >Thanks for taking a look. >> > > >> > >GRO doesn't know. Setting SKB_GSO_TCP_ECN in case of not knowing can >> > >result in header change that corrupts ACE field. Thus, GRO has to assume >> > >the RFC3168 CWR offloading trick cannot be used anymore (unless it tracks >> > >the connection and knows the bits are used for RFC3168 which is something >> > >nobody is going to do). >> > > >> > >The main point of SKB_GSO_TCP_ACCECN is to prevent SKB_GSO_TCP_ECN or >> > >NETIF_F_TSO_ECN offloading to be used for the skb as it would corrupt ACE >> > >field value. >> > >> > Hi Eric and Ilpo, >> > >> > From my understanding of another email thread (patch 08/13), it seems the >> > conclusions that SKB_GSO_TCP_ACCECN will still be needed. >> > >> > > >> > >SKB_GSO_TCP_ACCECN doesn't allow CWR bits change within a super-skb but >> > >the same CWR flag should be repeated for all segments. In a sense, it's >> > >simpler than RFC3168 offloading. >> > > >> > >> SKB_GSO_TCP_ECN is also used from other points, what is your plan ? >> > >> >> > >> git grep -n SKB_GSO_TCP_ECN >> > >> drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:3888: >> > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN; >> > >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1291: >> > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN; >> > >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1312: >> > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN; >> > > >> > >These looked like they should be just changed to set SKB_GSO_TCP_ACCECN >> > >instead. >> > >> > I agree with these changes and will apply them in the next version. >> > >> > > >> > >I don't anymore recall why I didn't change those when I made this patch >> > >long time ago, perhaps it was just an oversight or things have changed >> > >somehow since then. >> > > >> > >> include/linux/netdevice.h:5061: BUILD_BUG_ON(SKB_GSO_TCP_ECN != >> > >> (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT)); >> > >> include/linux/skbuff.h:664: SKB_GSO_TCP_ECN = 1 << 2, >> > > >> > >Not relevant. >> > > >> > >> include/linux/virtio_net.h:88: gso_type |= >> > >> SKB_GSO_TCP_ECN; >> > >> include/linux/virtio_net.h:161: switch (gso_type & >> > >> ~SKB_GSO_TCP_ECN) { >> > >> include/linux/virtio_net.h:226: if (sinfo->gso_type & >> > >> SKB_GSO_TCP_ECN) >> > > >> > >These need to be looked further what's going on as UAPI is also >> > >involved here. >> > >> > I had a look at these parts, and only the 1st and 3rd ones are relevant. >> > Related to the 1st one, I propose to change from >> > >> > if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) >> > gso_type |= SKB_GSO_TCP_ECN; >> > >> > to >> > >> > if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) >> > gso_type |= SKB_GSO_TCP_ACCECN; >> > >> > The reason behind this proposed change is similar as the above >> > changes in en_rx.c. >> >> But en_rx.c is based one CWR flag, there's no similar check here. >> >> > For the 3rd one, I suggest to change from >> > >> > if (sinfo->gso_type & SKB_GSO_TCP_ECN) >> > hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN; >> > >> > to >> > >> > if (sinfo->gso_type & (SKB_GSO_TCP_ECN | >> > SKB_GSO_TCP_ACCECN)) >> > hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN; >> > >> > This proposed change is because VIRTIO_NET_HDR_GSO_ECN must be set >> > to allow TCP packets requiring segmentation offload which have ECN bit set. >> > So, no matter whether skb gso_type have GSO_TCP_ECN flag or >> > GSO_TCP_ACCECN flag, the corresponding hdr gso_type shall be set. >> > >> > But, I wonder what would the driver do when with >> > VIRTIO_NET_HDR_GSO_ECN flag. Will it corrupts CWR or not? >> >> I'm starting to heavily question what is the meaning of that >> VIRTIO_NET_HDR_GSO_ECN flag and if it's even consistent... >> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith >> ub.com%2Fqemu%2Fqemu%2Fblob%2Fmaster%2Fnet%2Feth.c&data=05%7C02%7Cchia >> -yu.chang%40nokia-bell-labs.com%7C64e73bacd5eb4647905208dd03848651%7C5 >> d4717519675428d917b70f44f9630b0%7C0%7C0%7C638670589949336838%7CUnknown >> %7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4 >> zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=iEeGdA4mcq1WC >> unKWvxNZ6yt213xKWKOpjyYdBzLzTk%3D&reserved=0 >> >> That sets VIRTIO_NET_HDR_GSO_ECN based on CE?? (Whereas kernel >> associates the related SKB_GSO_TCP_ECN to CWR offloading.) >> >> The virtio spec is way too vague to be useful so it would not be >> surprising if there are diverging interpretations from implementers: >> >> "If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the >> VIRTIO_NET_HDR_GSO_ECN bit in gso_type indicates that the TCP packet >> has the ECN bit set." >> >> What is "the ECN bit" in terms used by TCP (or IP)? Could some virtio >> folks please explain? > >According to the current Linux implementation in virtio_net_hdr_to_skb(): > > if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) > gso_type |= SKB_GSO_TCP_ECN; > >It mapps to SKB_GSO_TCP_ECN which is: > > /* This indicates the tcp segment has CWR set. */ > SKB_GSO_TCP_ECN = 1 << 2, > >Thanks
This patch series is related to adding new SKB_GSO_TCP_ACCECN. The reason behind is because how the CWR field handling is different between RFC3168 and AccECN. You can to find the original patches in below: https://lore.kernel.org/netdev/20241105100647.117346-9-chia-yu.ch...@nokia-bell-labs.com/ https://lore.kernel.org/netdev/20241105100647.117346-10-chia-yu.ch...@nokia-bell-labs.com/ When reviewing all occurrence of SKB_GSO_TCP_ECN, we see it's related to VIRTIO_NET_HDR_GSO_ECN. But after checking the virtio spec, it's quite unclear about VIRTIO_NET_HDR_GSO_ECN: "If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the VIRTIO_NET_HDR_GSO_ECN bit in gso_type indicates that the TCP packet has the ECN bit set." In virtnet_probe(), it looks like VIRTIO_NET_F_HOST_ECN is used to enable NETIF_F_TSO_ECN support: if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN)) dev->hw_features |= NETIF_F_TSO_ECN; So the question is that is its TSO engine can be set to not touch CWR flag even with NETIF_F_TSO_ECN?