> On Dec 15, 2020, at 19:06, Eric Dumazet <eduma...@google.com> wrote: > > On Tue, Dec 15, 2020 at 3:08 AM Jakub Kicinski <k...@kernel.org> wrote: >> >> On Sun, 13 Dec 2020 21:59:45 +0800 Cambda Zhu wrote: >>>> On Dec 13, 2020, at 06:32, Jakub Kicinski <k...@kernel.org> wrote: >>>> On Tue, 8 Dec 2020 17:19:10 +0800 Cambda Zhu wrote: >>>>> For each TCP zero window probe, the icsk_backoff is increased by one and >>>>> its max value is tcp_retries2. If tcp_retries2 is greater than 63, the >>>>> probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the >>>>> shift count would be masked to range 0 to 63. And on ARMv7 the result is >>>>> zero. If the shift count is masked, only several probes will be sent >>>>> with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it >>>>> needs tcp_retries2 times probes to end this false timeout. Besides, >>>>> bitwise shift greater than or equal to the width is an undefined >>>>> behavior. >>>> >>>> If icsk_backoff can reach 64, can it not also reach 256 and wrap? >>> >>> If tcp_retries2 is set greater than 255, it can be wrapped. But for TCP >>> probe0, >>> it seems to be not a serious problem. The timeout will be icsk_rto and >>> backoff >>> again. And considering icsk_backoff is 8 bits, not only it may always be >>> lesser >>> than tcp_retries2, but also may always be lesser than tcp_orphan_retries. >>> And >>> the icsk_probes_out is 8 bits too. So if the max_probes is greater than 255, >>> the connection won’t abort even if it’s an orphan sock in some cases. >>> >>> We can change the type of icsk_backoff/icsk_probes_out to fix these >>> problems. >>> But I think maybe the retries greater than 255 have no sense indeed and the >>> RFC >>> only requires the timeout(R2) greater than 100s at least. Could it be >>> better to >>> limit the min/max ranges of their sysctls? >> >> All right, I think the patch is good as is, applied for 5.11, thank you! > > It looks like we can remove the (u64) casts then. > > Also if we _really_ care about icsk_backoff approaching 63, we also > need to change inet_csk_rto_backoff() ?
Yes, we need. But the socket can close after tcp_orphan_retries times probes even if alive is always true. And there’re something I’m not very clear yet: 1) inet_csk_rto_backoff() may be not only for TCP, and the RFC 6298 requires the max value of RTO is 60 seconds at least. So what’s the proper shift limit? 2) If max_probes is greater than 255, the icsk_probes_out cannot be greater than max_probes and the connection may not close forever. This looks more serious. > > Was your patch based on a real world use, or some fuzzer UBSAN report ? > I found this issue not on TCP. I’m developing a private protocol, and in this protocol I made something like probe0 with max RTO lesser than 120 seconds. I found similar issues on testing and found Linux TCP have same issues. So it’s not a real world use for TCP and it may be ok to ignore the issues. > diff --git a/include/net/inet_connection_sock.h > b/include/net/inet_connection_sock.h > index > 7338b3865a2a3d278dc27c0167bba1b966bbda9f..a2a145e3b062c0230935c293fc1900df095937d4 > 100644 > --- a/include/net/inet_connection_sock.h > +++ b/include/net/inet_connection_sock.h > @@ -242,9 +242,10 @@ static inline unsigned long > inet_csk_rto_backoff(const struct inet_connection_sock *icsk, > unsigned long max_when) > { > - u64 when = (u64)icsk->icsk_rto << icsk->icsk_backoff; > + u8 backoff = min_t(u8, 32U, icsk->icsk_backoff); > + u64 when = (u64)icsk->icsk_rto << backoff; > > - return (unsigned long)min_t(u64, when, max_when); > + return (unsigned long)min_t(u64, when, max_when); > } > > struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern); > diff --git a/include/net/tcp.h b/include/net/tcp.h > index > 78d13c88720fda50e3f1880ac741cea1985ef3e9..fc6e4d40fd94a717d24ebd8aef7f7930a4551fe9 > 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -1328,9 +1328,8 @@ static inline unsigned long > tcp_probe0_when(const struct sock *sk, > { > u8 backoff = min_t(u8, ilog2(TCP_RTO_MAX / TCP_RTO_MIN) + 1, > inet_csk(sk)->icsk_backoff); > - u64 when = (u64)tcp_probe0_base(sk) << backoff; > > - return (unsigned long)min_t(u64, when, max_when); > + return min(tcp_probe0_base(sk) << backoff, max_when); > } > > static inline void tcp_check_probe_timer(struct sock *sk)