10] Implement local diversion of IPv4 skbs

Patrick McHardy Wed, 10 Jan 2007 04:21:12 -0800

KOVACS Krisztian wrote:
> On Wednesday 10 January 2007 07:46, Patrick McHardy wrote:
> 
>>>+                    if (sk) {
>>>+                            sock_hold(sk);
>>>+                            skb->sk = sk;
>>
>>This looks racy, the socket could be closed between the lookup and
>>the actual use. Why do you need the socket lookup at all, can't
>>you just divert all packets selected by iptables?
> 
> 
>   Yes, it's racy, but I this is true for the "regular" socket lookup, too. 
> Take UDP for example: __udp4_lib_rcv() does the socket lookup, gets a 
> reference to the socket, and then calls udp_queue_rcv_skb() to queue the 
> skb. As far as I can see there's nothing there which prevents the socket 
> from being closed between these calls. sk_common_release() even documents 
> this behaviour:
> 
>       [...]
>       if (sk->sk_prot->destroy)
>               sk->sk_prot->destroy(sk);
> 
>       /*
>        * Observation: when sock_common_release is called, processes have
>        * no access to socket. But net still has.
>        * Step one, detach it from networking:
>        *
>        * A. Remove from hash tables.
>        */
> 
>       sk->sk_prot->unhash(sk);
> 
>       /*
>        * In this point socket cannot receive new packets, but it is possible
>        * that some packets are in flight because some CPU runs receiver and
>        * did hash table lookup before we unhashed socket. They will achieve
>        * receive queue and will be purged by socket destructor.
>        *
>        * Also we still have packets pending on receive queue and probably,
>        * our own packets waiting in device queues. sock_destroy will drain
>        * receive queue, but transmitted packets will delay socket destruction
>        * until the last reference will be released.
>        */
>       [...]
>
>   Of course it's true that doing early lookups and storing that reference 
> in the skb widens the window considerably, but I think this race is 
> already handled. Or is there anything I don't see?


You're right, it seems to be handled properly (except I think there is
a race between sk_common_release calling xfrm_sk_free_policy and f.e.
udp calling __xfrm_policy_check, will look into that).

It probably shouldn't be cached anyway, with nf_queue for example
the window could be _really_ large.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs

Reply via email to