On Wed, Jun 3, 2020 at 10:08 PM Neal Cardwell <ncardw...@google.com> wrote: > > On Wed, Jun 3, 2020 at 9:55 AM Eric Dumazet <eduma...@google.com> wrote: > > > > On Wed, Jun 3, 2020 at 5:02 AM Neal Cardwell <ncardw...@google.com> wrote: > > > > > > On Wed, Jun 3, 2020 at 1:44 AM Eric Dumazet <eduma...@google.com> wrote: > > > > > > > > On Tue, Jun 2, 2020 at 10:05 PM Jason Xing <kerneljasonx...@gmail.com> > > > > wrote: > > > > > > > > > > Hi Eric, > > > > > > > > > > I'm still trying to understand what you're saying before. Would this > > > > > be better as following: > > > > > 1) discard the tcp_internal_pacing() function. > > > > > 2) remove where the tcp_internal_pacing() is called in the > > > > > __tcp_transmit_skb() function. > > > > > > > > > > If we do so, we could avoid 'too late to give up pacing'. Meanwhile, > > > > > should we introduce the tcp_wstamp_ns socket field as commit > > > > > (864e5c090749) does? > > > > > > > > > > > > > Please do not top-post on netdev mailing list. > > > > > > > > > > > > I basically suggested double-checking which point in TCP could end up > > > > calling tcp_internal_pacing() > > > > while the timer was already armed. > > > > > > > > I guess this is mtu probing. > > > > > > Perhaps this could also happen from some of the retransmission code > > > paths that don't use tcp_xmit_retransmit_queue()? Perhaps > > > tcp_retransmit_timer() (RTO) and tcp_send_loss_probe() TLP? It seems > > > they could indirectly cause a call to __tcp_transmit_skb() and thus > > > tcp_internal_pacing() without first checking if the pacing timer was > > > already armed? > > > > I feared this, (see recent commits about very low pacing rates) :/ > > > > I am not sure we need to properly fix all these points for old > > kernels, since EDT model got rid of these problems. > > Agreed. > > > Maybe we can try to extend the timer. > > Sounds good. > > > Something like : > > > > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > > index > > cc4ba42052c21b206850594db6751810d8fc72b4..626b9f4f500f7e5270d8d59e6eb16dbfa3efbc7c > > 100644 > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -966,6 +966,8 @@ enum hrtimer_restart tcp_pace_kick(struct hrtimer > > *timer) > > > > static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb) > > { > > + struct tcp_sock *tp = tcp_sk(sk); > > + ktime_t expire, now; > > u64 len_ns; > > u32 rate; > > > > @@ -977,12 +979,29 @@ static void tcp_internal_pacing(struct sock *sk, > > const struct sk_buff *skb) > > > > len_ns = (u64)skb->len * NSEC_PER_SEC; > > do_div(len_ns, rate); > > - hrtimer_start(&tcp_sk(sk)->pacing_timer, > > - ktime_add_ns(ktime_get(), len_ns), > > + > > + now = ktime_get(); > > + /* If hrtimer is already armed, then our caller has not > > + * used tcp_pacing_check(). > > + */ > > + if (unlikely(hrtimer_is_queued(&tp->pacing_timer))) { > > + expire = hrtimer_get_softexpires(&tp->pacing_timer); > > + if (ktime_after(expire, now)) > > + now = expire; > > + if (hrtimer_try_to_cancel(&tp->pacing_timer) == 1) > > + __sock_put(sk); > > + } > > + hrtimer_start(&tp->pacing_timer, ktime_add_ns(now, len_ns), > > HRTIMER_MODE_ABS_PINNED_SOFT); > > sock_hold(sk); > > } > > > > +static bool tcp_pacing_check(const struct sock *sk) > > +{ > > + return tcp_needs_internal_pacing(sk) && > > + hrtimer_is_queued(&tcp_sk(sk)->pacing_timer); > > +} > > + > > static void tcp_update_skb_after_send(struct tcp_sock *tp, struct sk_buff > > *skb) > > { > > skb->skb_mstamp = tp->tcp_mstamp; > > @@ -2117,6 +2136,9 @@ static int tcp_mtu_probe(struct sock *sk) > > if (!tcp_can_coalesce_send_queue_head(sk, probe_size)) > > return -1; > > > > + if (tcp_pacing_check(sk)) > > + return -1; > > + > > /* We're allowed to probe. Build it now. */ > > nskb = sk_stream_alloc_skb(sk, probe_size, GFP_ATOMIC, false); > > if (!nskb) > > @@ -2190,11 +2212,6 @@ static int tcp_mtu_probe(struct sock *sk) > > return -1; > > } > > > > -static bool tcp_pacing_check(const struct sock *sk) > > -{ > > - return tcp_needs_internal_pacing(sk) && > > - hrtimer_is_queued(&tcp_sk(sk)->pacing_timer); > > -} > > > > /* TCP Small Queues : > > * Control number of packets in qdisc/devices to two packets / or ~1 ms. > > Thanks for your fix, Eric. This fix looks good to me! I agree that > this fix is good enough for older kernels. >
I just tested this patch and it worked well. So it also looks good to me :) Nice work! thanks, Jason > thanks, > neal