On 06/29/2018 11:49 AM, Willem de Bruijn wrote:
>>>> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
>>>> +static void report_sock_error(struct sk_buff *skb, u32 err, u8 code)
>>>> +{
>>>> + struct sock_exterr_skb *serr;
>>>> + ktime_t txtime = skb->tstamp;
>>>> +
>>>> + if (!skb->sk || !(skb->sk->sk_txtime_flags &
>>>> SK_TXTIME_RECV_ERR_MASK))
>>>> + return;
>>>> +
>>>> + skb = skb_clone_sk(skb);
>>>> + if (!skb)
>>>> + return;
>>>> +
>>>> + sock_hold(skb->sk);
>>>
>>> Why take an extra reference? The skb holds a ref on the sk.
>>
>>
>> Yes, the cloned skb holds a ref on the socket, but the documentation of
>> skb_clone_sk() makes this explicit suggestion:
>>
>> (...)
>> * When passing buffers allocated with this function to sock_queue_err_skb
>> * it is necessary to wrap the call with sock_hold/sock_put in order to
>> * prevent the socket from being released prior to being enqueued on
>> * the sk_error_queue.
>> */
>>
>> which I believe is here just so we are protected against a possible race
>> after
>> skb_orphan() is called from sock_queue_err_skb(). Please let me know if I'm
>> misreading anything.
>
> Yes, indeed. Code only has to worry about that if there are no
> concurrent references
> on the socket.
>
> I may be mistaken, but I believe that this complicated logic exists
> only for cases where
> the timestamp may be queued after the original skb has been released.
> Specifically,
> when a tx timestamp is returned from a hardware device after transmission of
> the
> original skb. Then the cloned timestamp skb needs its own reference on
> the sk while
> it is waiting for the timestamp data (i.e., until the device
> completion arrives) and then
> we need a temporary extra ref to work around the skb_orphan in
> sock_queue_err_skb.
>
> Compare skb_complete_tx_timestamp with skb_tstamp_tx. The second is used in
> the regular datapath to clone an skb and queue it on the error queue
> immediately,
> while holding the original skb. This does not call skb_clone_sk and
> does not need the
> extra sock_hold. This should be good enough for this code path, too.
> As kb holds a
> ref on skb->sk, the socket cannot go away in the middle of report_sock_error.
Oh, that makes sense. Great, I will give this a try and add it to the v2.
Thanks,
Jesus