On 6/21/19 5:51 AM, Florian Westphal wrote:
> Jakub Sitnicki <ja...@cloudflare.com> wrote:
>>> So, at least for this part I don't see a technical reason why this
>>> has to grab a reference for listener socket.
>>
>> That's helpful, thanks! We rely on TPROXY, so I would like to help with
>> that. Let me see if I can get time to work on it.
>
> AFAICS so far this would be enough:
>
> 1. remove the BUG_ON() in skb_orphan, letting it clear skb->sk instead
> 2. in nf_queue_entry_get_refs(), if skb->sk and no destructor:
> call nf_tproxy_assign_sock() so a reference gets taken.
> 3. change skb_steal_sock:
> static inline struct sock *skb_steal_sock(struct sk_buff *skb, bool
> *refcounted)
> [..]
> *refcounted = skb->destructor != NULL;
> 4. make tproxy sk assign elide the destructor assigment in case of
> a listening sk.
>
Okay, but how do we make sure the skb->sk association does not leak from rcu
section ?
Note we have the noref/refcounted magic for skb_dst(), we might try to use
something similar
for skb->sk
> This should work because TPROXY target is restricted to PRE_ROUTING, and
> __netif_receive_skb_core runs with rcu readlock already held.
>
> On a side note, it would also be interesting to see what breaks if the
> nf_tproxy_sk_is_transparent() check in the tprox eval function is
> removed -- if we need the transparent:1 marker only for output, i think
> it would be ok to raise the bit transparently in the kernel in case
> we assign skb->sk = found_sk; i.e.
> if (unlikely(!sk_is_transparent(sk))
> make_sk_transparent(sk);
>
> I don't see a reason why we need the explicit setsockopt(IP_TRANSPARENT)
> from userspace.
>