> -----Original Message-----
> From: Paolo Abeni <[email protected]>
> Sent: Tuesday, June 17, 2025 10:03 AM
> To: Chia-Yu Chang (Nokia) <[email protected]>;
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Koen De Schepper (Nokia)
> <[email protected]>; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Cc: Olivier Tilmans (Nokia) <[email protected]>
> Subject: Re: [PATCH v8 net-next 04/15] tcp: AccECN core
>
>
> CAUTION: This is an external email. Please be very careful when clicking
> links or opening attachments. See the URL nok.it/ext for additional
> information.
>
>
>
> On 6/10/25 2:53 PM, [email protected] wrote:
> > From: Ilpo Järvinen <[email protected]>
> >
> > This change implements Accurate ECN without negotiation and AccECN
> > Option (that will be added by later changes). Based on AccECN
> > specifications:
> > https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
> >
> > Accurate ECN allows feeding back the number of CE (congestion
> > experienced) marks accurately to the sender in contrast to
> > RFC3168 ECN that can only signal one marks-seen-yes/no per RTT.
> > Congestion control algorithms can take advantage of the accurate ECN
> > information to fine-tune their congestion response to avoid drastic
> > rate reduction when only mild congestion is encountered.
> >
> > With Accurate ECN, tp->received_ce (r.cep in AccECN spec) keeps track
> > of how many segments have arrived with a CE mark. Accurate ECN uses
> > ACE field (ECE, CWR, AE) to communicate the value back to the sender
> > which updates tp->delivered_ce (s.cep) based on the feedback. This
> > signalling channel is lossy when ACE field overflow occurs.
> >
> > Conservative strategy is selected here to deal with the ACE overflow,
> > however, some strategies using the AccECN option later in the overall
> > patchset mitigate against false overflows detected.
> >
> > The ACE field values on the wire are offset by
> > TCP_ACCECN_CEP_INIT_OFFSET. Delivered_ce/received_ce count the real CE
> > marks rather than forcing all downstream users to adapt to the wire
> > offset.
> >
> > This patch uses the first 1-byte hole and the last 4-byte hole of the
> > tcp_sock_write_txrx for 'received_ce_pending' and 'received_ce'.
> > Also, the group size of tcp_sock_write_txrx is increased from
> > 91 + 4 to 95 + 4 due to the new u32 received_ce member. Below are the
> > trimmed pahole outcomes before and after this patch.
>
> AFAICS 'received_ce' fills the existing 4 bytes hole, so tcp_sock_write_txrx
> size should be now (95 + 0), am I missreading something?
Hi Paolo,
First, thanks for the feedback.
Regarding such "+ 4" is due to the byte alignemnt needed for ARM arch.
You can see below tha pahole outputs BEFORE and AFTER this patch, in which a
4-byte hole is added before "tcp_clock_cache" to make it aligns from the
multiple of 8.
And the comment in net/ipv4/tcp.c also explain it: "32bit arches with 8byte
alignment on u64 fields might need padding before tcp_clock_cache."
Does it make sense to you?
[BEFORE PATCH]
__u8 __cacheline_group_begin__tcp_sock_write_txrx[0]; /*
1869 0 */
u8 nonagle:4; /* 1869: 0 1 */
u8 rate_app_limited:1; /* 1869: 4 1 */
/* XXX 3 bits hole, try to pack */
/* XXX 2 bytes hole, try to pack */
__be32 pred_flags; /* 1872 4 */
/* XXX 4 bytes hole, try to pack */
u64 tcp_clock_cache; /* 1880 8 */
u64 tcp_mstamp; /* 1888 8 */
u32 rcv_nxt; /* 1896 4 */
u32 snd_nxt; /* 1900 4 */
u32 snd_una; /* 1904 4 */
u32 window_clamp; /* 1908 4 */
u32 srtt_us; /* 1912 4 */
u32 packets_out; /* 1916 4 */
/* --- cacheline 30 boundary (1920 bytes) --- */
u32 snd_up; /* 1920 4 */
u32 delivered; /* 1924 4 */
u32 delivered_ce; /* 1928 4 */
u32 app_limited; /* 1932 4 */
u32 rcv_wnd; /* 1936 4 */
struct tcp_options_received rx_opt; /* 1940 24 */
__u8 __cacheline_group_end__tcp_sock_write_txrx[0]; /*
1964 0 */
[AFTER PATCH]
__u8 __cacheline_group_begin__tcp_sock_write_txrx[0]; /*
1869 0 */
u8 nonagle:4; /* 1869: 0 1 */
u8 rate_app_limited:1; /* 1869: 4 1 */
/* XXX 3 bits hole, try to pack */
/* Force alignment to the next boundary: */
u8 :0;
u8 received_ce_pending:4; /* 1870: 0 1 */
u8 unused2:4; /* 1870: 4 1 */
/* XXX 1 byte hole, try to pack */
__be32 pred_flags; /* 1872 4 */
/* XXX 4 bytes hole, try to pack */
u64 tcp_clock_cache; /* 1880 8 */
u64 tcp_mstamp; /* 1888 8 */
u32 rcv_nxt; /* 1896 4 */
u32 snd_nxt; /* 1900 4 */
u32 snd_una; /* 1904 4 */
u32 window_clamp; /* 1908 4 */
u32 srtt_us; /* 1912 4 */
u32 packets_out; /* 1916 4 */
/* --- cacheline 30 boundary (1920 bytes) --- */
u32 snd_up; /* 1920 4 */
u32 delivered; /* 1924 4 */
u32 delivered_ce; /* 1928 4 */
u32 received_ce; /* 1932 4 */
u32 app_limited; /* 1936 4 */
u32 rcv_wnd; /* 1940 4 */
struct tcp_options_received rx_opt; /* 1944 24 */
__u8 __cacheline_group_end__tcp_sock_write_txrx[0]; /*
1968 0 */
>
> > @@ -384,17 +387,16 @@ static void tcp_data_ecn_check(struct sock *sk, const
> > struct sk_buff *skb)
> > if (tcp_ca_needs_ecn(sk))
> > tcp_ca_event(sk, CA_EVENT_ECN_IS_CE);
> >
> > - if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
> > + if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR) &&
> > + tcp_ecn_mode_rfc3168(tp)) {
> > /* Better not delay acks, sender can have a very low
> > cwnd */
> > tcp_enter_quickack_mode(sk, 2);
> > tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
> > }
> > - tp->ecn_flags |= TCP_ECN_SEEN;
>
> It's not clear why you need to move this statement earlier in the code path
> even for ecn_mode_rfc3168(). Either a comment or
>
> if (!tcp_ecn_mode_rfc3168(tp))
> break;
>
> a few lines aboved could help.
>
> > break;
> > default:
> > if (tcp_ca_needs_ecn(sk))
> > tcp_ca_event(sk, CA_EVENT_ECN_NO_CE);
> > - tp->ecn_flags |= TCP_ECN_SEEN;
>
> Same here.
>
> Thanks,
>
> Paolo
OK, will apply the above feedback in the next version and thanks.
BRs,
Chia-Yu