On Wed, Nov 13, 2024 at 5:19 AM Ilpo Järvinen <i...@kernel.org> wrote: > > Adding a few virtio people. Please see the virtio spec/flag question > below. > > On Tue, 12 Nov 2024, Chia-Yu Chang (Nokia) wrote: > > > >From: Ilpo Järvinen <i...@kernel.org> > > >Sent: Thursday, November 7, 2024 8:28 PM > > >To: Eric Dumazet <eduma...@google.com> > > >Cc: Chia-Yu Chang (Nokia) <chia-yu.ch...@nokia-bell-labs.com>; > > >netdev@vger.kernel.org; dsah...@gmail.com; da...@davemloft.net; > > >dsah...@kernel.org; pab...@redhat.com; joel.grana...@kernel.org; > > >k...@kernel.org; andrew+net...@lunn.ch; ho...@kernel.org; > > >pa...@netfilter.org; kad...@netfilter.org; > > >netfilter-de...@vger.kernel.org; coret...@netfilter.org; > > >ncardw...@google.com; Koen De Schepper (Nokia) > > ><koen.de_schep...@nokia-bell-labs.com>; g.wh...@cablelabs.com; > > >ingemar.s.johans...@ericsson.com; mirja.kuehlew...@ericsson.com; > > >chesh...@apple.com; rs.i...@gmx.at; jason_living...@comcast.com; > > >vidhi_g...@apple.com > > >Subject: Re: [PATCH v5 net-next 09/13] gro: prevent ACE field corruption & > > >better AccECN handling > > > > > > > > >CAUTION: This is an external email. Please be very careful when clicking > > >links or opening attachments. See the URL nok.it/ext for additional > > >information. > > > > > > > > > > > >On Thu, 7 Nov 2024, Eric Dumazet wrote: > > > > > >>On Tue, Nov 5, 2024 at 11:07 AM <chia-yu.ch...@nokia-bell-labs.com> wrote: > > >>> > > >>> From: Ilpo Järvinen <i...@kernel.org> > > >>> > > >>> There are important differences in how the CWR field behaves in > > >>> RFC3168 and AccECN. With AccECN, CWR flag is part of the ACE counter > > >>> and its changes are important so adjust the flags changed mask > > >>> accordingly. > > >>> > > >>> Also, if CWR is there, set the Accurate ECN GSO flag to avoid > > >>> corrupting CWR flag somewhere. > > >>> > > >>> Signed-off-by: Ilpo Järvinen <i...@kernel.org> > > >>> Signed-off-by: Chia-Yu Chang <chia-yu.ch...@nokia-bell-labs.com> > > >>> --- > > >>> net/ipv4/tcp_offload.c | 4 ++-- > > >>> 1 file changed, 2 insertions(+), 2 deletions(-) > > >>> > > >>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index > > >>> 0b05f30e9e5f..f59762d88c38 100644 > > >>> --- a/net/ipv4/tcp_offload.c > > >>> +++ b/net/ipv4/tcp_offload.c > > >>> @@ -329,7 +329,7 @@ struct sk_buff *tcp_gro_receive(struct list_head > > >>> *head, struct sk_buff *skb, > > >>> th2 = tcp_hdr(p); > > >>> flush = (__force int)(flags & TCP_FLAG_CWR); > > >>> flush |= (__force int)((flags ^ tcp_flag_word(th2)) & > > >>> - ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH)); > > >>> + ~(TCP_FLAG_FIN | TCP_FLAG_PSH)); > > >>> flush |= (__force int)(th->ack_seq ^ th2->ack_seq); > > >>> for (i = sizeof(*th); i < thlen; i += 4) > > >>> flush |= *(u32 *)((u8 *)th + i) ^ @@ -405,7 +405,7 > > >>> @@ void tcp_gro_complete(struct sk_buff *skb) > > >>> shinfo->gso_segs = NAPI_GRO_CB(skb)->count; > > >>> > > >>> if (th->cwr) > > >>> - shinfo->gso_type |= SKB_GSO_TCP_ECN; > > >>> + shinfo->gso_type |= SKB_GSO_TCP_ACCECN; > > >>> } > > >>> EXPORT_SYMBOL(tcp_gro_complete); > > >>> > > >> > > >> I do not really understand this patch. How a GRO engine can know which > > >> ECN variant the peers are using ? > > > > > >Hi Eric, > > > > > >Thanks for taking a look. > > > > > >GRO doesn't know. Setting SKB_GSO_TCP_ECN in case of not knowing can > > >result in header change that corrupts ACE field. Thus, GRO has to assume > > >the RFC3168 CWR offloading trick cannot be used anymore (unless it tracks > > >the connection and knows the bits are used for RFC3168 which is something > > >nobody is going to do). > > > > > >The main point of SKB_GSO_TCP_ACCECN is to prevent SKB_GSO_TCP_ECN or > > >NETIF_F_TSO_ECN offloading to be used for the skb as it would corrupt ACE > > >field value. > > > > Hi Eric and Ilpo, > > > > From my understanding of another email thread (patch 08/13), it seems the > > conclusions that SKB_GSO_TCP_ACCECN will still be needed. > > > > > > > >SKB_GSO_TCP_ACCECN doesn't allow CWR bits change within a super-skb but > > >the same CWR flag should be repeated for all segments. In a sense, it's > > >simpler than RFC3168 offloading. > > > > > >> SKB_GSO_TCP_ECN is also used from other points, what is your plan ? > > >> > > >> git grep -n SKB_GSO_TCP_ECN > > >> drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:3888: > > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN; > > >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1291: > > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN; > > >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1312: > > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN; > > > > > >These looked like they should be just changed to set SKB_GSO_TCP_ACCECN > > >instead. > > > > I agree with these changes and will apply them in the next version. > > > > > > > >I don't anymore recall why I didn't change those when I made this patch > > >long time ago, perhaps it was just an oversight or things have changed > > >somehow since then. > > > > > >> include/linux/netdevice.h:5061: BUILD_BUG_ON(SKB_GSO_TCP_ECN != > > >> (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT)); > > >> include/linux/skbuff.h:664: SKB_GSO_TCP_ECN = 1 << 2, > > > > > >Not relevant. > > > > > >> include/linux/virtio_net.h:88: gso_type |= > > >> SKB_GSO_TCP_ECN; > > >> include/linux/virtio_net.h:161: switch (gso_type & > > >> ~SKB_GSO_TCP_ECN) { > > >> include/linux/virtio_net.h:226: if (sinfo->gso_type & > > >> SKB_GSO_TCP_ECN) > > > > > >These need to be looked further what's going on as UAPI is also > > > involved here. > > > > I had a look at these parts, and only the 1st and 3rd ones are relevant. > > Related to the 1st one, I propose to change > > from > > > > if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) > > gso_type |= SKB_GSO_TCP_ECN; > > > > to > > > > if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) > > gso_type |= SKB_GSO_TCP_ACCECN; > > > > The reason behind this proposed change is similar as the above changes > > in en_rx.c. > > But en_rx.c is based one CWR flag, there's no similar check here. > > > For the 3rd one, I suggest to change from > > > > if (sinfo->gso_type & SKB_GSO_TCP_ECN) > > hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN; > > > > to > > > > if (sinfo->gso_type & (SKB_GSO_TCP_ECN | > > SKB_GSO_TCP_ACCECN)) > > hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN; > > > > This proposed change is because VIRTIO_NET_HDR_GSO_ECN must be set to > > allow TCP packets requiring segmentation offload which have ECN bit set. > > So, no matter whether skb gso_type have GSO_TCP_ECN flag or > > GSO_TCP_ACCECN flag, the corresponding hdr gso_type shall be set. > > > > But, I wonder what would the driver do when with VIRTIO_NET_HDR_GSO_ECN > > flag. Will it corrupts CWR or not? > > I'm starting to heavily question what is the meaning of that > VIRTIO_NET_HDR_GSO_ECN flag and if it's even consistent... > > https://github.com/qemu/qemu/blob/master/net/eth.c > > That sets VIRTIO_NET_HDR_GSO_ECN based on CE?? (Whereas kernel associates > the related SKB_GSO_TCP_ECN to CWR offloading.) > > The virtio spec is way too vague to be useful so it would not be > surprising if there are diverging interpretations from implementers: > > "If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the > VIRTIO_NET_HDR_GSO_ECN bit in gso_type indicates that the TCP packet has > the ECN bit set." > > What is "the ECN bit" in terms used by TCP (or IP)? Could some virtio > folks please explain?
According to the current Linux implementation in virtio_net_hdr_to_skb(): if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN) gso_type |= SKB_GSO_TCP_ECN; It mapps to SKB_GSO_TCP_ECN which is: /* This indicates the tcp segment has CWR set. */ SKB_GSO_TCP_ECN = 1 << 2, Thanks > > > -- > i.