On Fri, Jun 14, 2019 at 2:35 AM maowenan <maowe...@huawei.com> wrote: > > > > On 2019/6/14 12:28, Eric Dumazet wrote: > > > > > > On 6/13/19 9:19 PM, maowenan wrote: > >> > >> > >> @Eric, for this issue I only want to check TCP_NEW_SYN_RECV sk, is it OK > >> like below? > >> + if (!osk && sk->sk_state == TCP_NEW_SYN_RECV) > >> + reqsk = __inet_lookup_established(sock_net(sk), > >> &tcp_hashinfo, > >> + sk->sk_daddr, > >> sk->sk_dport, > >> + sk->sk_rcv_saddr, > >> sk->sk_num, > >> + > >> sk->sk_bound_dev_if, sk->sk_bound_dev_if); > >> + if (unlikely(reqsk)) { > >> > > > > Not enough. > > > > If we have many cpus here, there is a chance another cpu has inserted a > > request socket, then > > replaced it by an ESTABLISH socket for the same 4-tuple. > > I try to get more clear about the scene you mentioned. And I have do some > testing about this, it can work well > when I use multiple cpus. > > The ESTABLISH socket would be from > tcp_check_req->tcp_v4_syn_recv_sock->tcp_create_openreq_child, > and for this path, inet_ehash_nolisten pass osk(NOT NULL), my patch won't > call __inet_lookup_established in inet_ehash_insert(). > > When TCP_NEW_SYN_RECV socket try to inset to hash table, it will pass osk > with NULL, my patch will check whether reqsk existed > in hash table or not. If reqsk is existed, it just removes this reqsk and > dose not insert to hash table. Then the synack for this > reqsk can't be sent to client, and there is no chance to receive the ack from > client, so ESTABLISH socket can't be replaced in hash table. > > So I don't see the race when there are many cpus. Can you show me some clue?
This is a bit silly. You focus on some crash you got on a given system, but do not see the real bug. CPU A SYN packet lookup finds nothing. Create a NEW_SYN_RECV <long delay, like hardware interrupts calling some buggy driver or something> CPU B SYN packet -> inserts a NEW_SYN_RECV sends a SYNACK ACK packet -> replaces the NEW_SYN_RECV by ESTABLISH socket CPU A resumes. Basically a lookup (after taking the bucket spinlock) could either find : - Nothing (typical case where there was no race) - A NEW_SYN_RECV - A ESTABLISHED socket - A TIME_WAIT socket. You can not simply fix the "NEW_SYN_RECV" state case, and possibly add hard crashes (instead of current situation leading to RST packets)