Hi folks, picking this up again.. As discussed during LSFMM, I've been looking at adding something like an `skb_sk_assign()` helper to BPF so that logic similar to TPROXY can be implemented with integration into other BPF logic, however currently any attempts to do so are blocked by the skb_orphan() call in ip_rcv_core() (which will effectively ignore any socket assign decision made by the TC BPF program).
Recently I was attempting to remove the skb_orphan() call, and I've been trying different things but there seems to be some context I'm missing. Here's the core of the patch: diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index ed97724c5e33..16aea980318a 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -500,8 +500,6 @@ static struct sk_buff *ip_rcv_core(struct sk_buff *skb, struct net *net) memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); IPCB(skb)->iif = skb->skb_iif; - /* Must drop socket now because of tproxy. */ - skb_orphan(skb); return skb; The statement that the socket must be dropped because of tproxy doesn't make sense to me, because the PRE_ROUTING hook is hit after this, which will call into the tproxy logic and eventually nf_tproxy_assign_sock() which already does the skb_orphan() itself. However, if I drop these lines then I end up causing sockets to release references too many times. Seems like if we don't orphan the skb here, then later logic assumes that we have one more reference than we actually have, and decrements the count when it shouldn't (perhaps the skb_steal_sock() call in __inet_lookup_skb() which seems to assume we always have a reference to the socket?) Splat: refcount_t hit zero at sk_stop_timer+0x2c/0x30 in cilium-agent[16359], uid/euid: 0/0 WARNING: CPU: 0 PID: 16359 at kernel/panic.c:686 refcount_error_report+0x9c/0xa1 ... ? inet_put_port+0xa6/0xd0 inet_csk_clear_xmit_timers+0x2e/0x50 tcp_done+0x8b/0xf0 tcp_reset+0x49/0xc0 tcp_validate_incoming+0x2f7/0x410 tcp_rcv_state_process+0x250/0xdb6 ? tcp_v4_connect+0x46f/0x4e0 tcp_v4_do_rcv+0xbd/0x1f0 __release_sock+0x84/0xd0 release_sock+0x30/0xa0 inet_stream_connect+0x47/0x60 (Full version: https://gist.github.com/joestringer/d5313e4bf4231e2c46405bd7a3053936 ) This seems potentially related to some of the socket referencing discussion in the peer thread "[RFC bpf-next 0/7] Programming socket lookup with BPF". During LSFMM, it seemed like no-one knew quite why the skb_orphan() is necessary in that path in the current version of the code, and that we may be able to remove it. Florian, I know you weren't in the room for that discussion, so raising it again now with a stack trace, Do you have some sense what's going on here and whether there's a path towards removing it from this path or allowing the skb->sk to be retained during ip_rcv() in some conditions?