This is an AI-generated review of your patch. The human sending this email has considered the AI review valid, or at least plausible.
For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html --- mptcp: better mptcp-level RTT estimator The current MPTCP-level RTT estimator has several issues. On high speed links, the MPTCP-level receive buffer auto-tuning happens with a frequency well above the TCP-level's one. That in turn can cause excessive/unneeded receive buffer increase. On such links, the initial rtt_us value is considerably higher than the actual delay, and the current mptcp_rcv_space_adjust() updates msk->rcvq_space.rtt_us with a period equal to the such field previous value. If the initial rtt_us is 40ms, its first update will happen after 40ms, even if the subflows see actual RTT orders of magnitude lower. Additionally: - setting the msk RTT to the maximum among all the subflows RTTs makes DRS constantly overshooting the rcvbuf size when a subflow has considerable higher latency than the other(s). - during unidirectional bulk transfers with multiple active subflows, the TCP-level RTT estimator occasionally sees considerably higher value than the real link delay, i.e. when the packet scheduler reacts to an incoming ACK on given subflow pushing data on a different subflow. - currently inactive but still open subflows (i.e. switched to backup mode) are always considered when computing the msk-level RTT. Address the all the issues above with a more accurate RTT estimation strategy: the MPTCP-level RTT is set to the minimum of all the subflows actually feeding data into the MPTCP receive buffer, using a small sliding window. > While at it, also use EWMA to compute the msk-level scaling_ratio, to > that MPTCP can avoid traversing the subflow list is > mptcp_rcv_space_adjust(). This isn't a bug, but there are two typos here that make the sentence hard to parse. Should this be 'so that MPTCP can avoid traversing the subflow list in mptcp_rcv_space_adjust()'? Use some care to avoid updating msk and ssk level fields too often. Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning") [ ... ] > diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h > index 0bd1ee860316..6ec65c0ae655 100644 > --- a/net/mptcp/protocol.h > +++ b/net/mptcp/protocol.h [ ... ] > @@ -422,6 +435,27 @@ static inline struct mptcp_data_frag > *mptcp_send_head(const struct sock *sk) > return msk->first_pending; > } > > +static inline void mptcp_init_rtt_est(struct mptcp_sock *msk) > +{ > + int i; > + > + for (i = 0; i < MPTCP_RTT_SAMPLES; ++i) > + msk->rcv_rtt_est.samples[i] = U32_MAX; > + msk->rcv_rtt_est.next_sample = 0; > + msk->scaling_ratio = TCP_DEFAULT_SCALING_RATIO; > +} > + > +static inline u32 mptcp_rtt_us_est(const struct mptcp_sock *msk) > +{ > + u32 rtt_us = msk->rcv_rtt_est.samples[0]; ^^^^ Is samples[0] missing a READ_ONCE()? The function comment says 'Lockless access of collected samples' and the writer mptcp_rcv_rtt_update() uses WRITE_ONCE() for all sample indices (including index 0) with a comment 'Pairs with READ_ONCE() in mptcp_rtt_us_est()': net/mptcp/protocol.c:mptcp_rcv_rtt_update() { ... /* Pairs with READ_ONCE() in mptcp_rtt_us_est(). */ id = msk->rcv_rtt_est.next_sample; WRITE_ONCE(msk->rcv_rtt_est.samples[id], rtt_us); ... } However, samples[0] is read with a plain load while samples[1..4] correctly use READ_ONCE(): > + int i; > + > + /* Lockless access of collected samples. */ > + for (i = 1; i < MPTCP_RTT_SAMPLES; ++i) > + rtt_us = min(rtt_us, READ_ONCE(msk->rcv_rtt_est.samples[i])); > + return rtt_us; > +} The writer runs under mptcp_data_lock (spin_lock_bh) while the reader runs under the socket lock (msk_owned_by_me) -- these are different locks, so WRITE_ONCE/READ_ONCE pairing is required. KCSAN would flag this as a data race. > + > static inline struct mptcp_data_frag *mptcp_send_next(struct sock *sk) > { > struct mptcp_sock *msk = mptcp_sk(sk); [ ... ] > @@ -523,6 +557,8 @@ struct mptcp_subflow_context { > u32 map_data_len; > __wsum map_data_csum; > u32 map_csum_len; > + u32 prev_rtt_us; ^^^^ Is this field used anywhere? The field 'prev_rtt_us' is added to struct mptcp_subflow_context but doesn't appear to be referenced anywhere in the codebase. Only 'prev_rtt_seq' is used in mptcp_rcv_rtt_update(). > + u32 prev_rtt_seq; > u32 request_mptcp : 1, /* send MP_CAPABLE */ > request_join : 1, /* send MP_JOIN */ > request_bkup : 1, -- pw-bot: cr
