Re: [ovs-dev] Possible Regression in 'netdev: Add new "struct netdev_rx" for capturing packets from a netdev.'

Murphy McCauley Wed, 12 Jun 2013 19:34:29 -0700

On Jun 12, 2013, at 5:34 PM, Simon Horman wrote:

> On Wed, Jun 12, 2013 at 08:08:50AM -0700, Ben Pfaff wrote:
>> On Wed, Jun 12, 2013 at 07:11:50AM -0700, Murphy McCauley wrote:
>>> 
>>> On Jun 12, 2013, at 6:28 AM, Ed Maste wrote:
>>> 
>>>> On 12 June 2013 07:04, Murphy McCauley <[email protected]> wrote:
>>>>> (Sorry this isn't an actual reply and is missing context -- I wasn't on 
>>>>> the list when it was originally posted.)
>>>>> 
>>>>> Simon and I have been in touch about this, and I thought I'd share my 
>>>>> findings for what they're worth.
>>>>> 
>>>>> The problem is from the commit Simon mentioned 
>>>>> (796223f5bc3a4896e6398733c798390158479400).  Specifically, it's in 
>>>>> netdev-linux.c in netdev_linux_send().
>>>>> 
>>>>> The new version always sends using the "sender" socket made by 
>>>>> af_packet_sock() unless the interface is a tap, in which case it sends it 
>>>>> using the tap fd.  This differs from the old version which sent using 
>>>>> whatever was in the fd field of the netdev if it was available.  For tap 
>>>>> interfaces, this was the tap fd, so the result was the same as it is now. 
>>>>>  But for other interfaces, this held the socket opened for receiving if 
>>>>> the interface was listening (which was maybe never "right" in some sense 
>>>>> and isn't convenient anymore since this socket descriptor is no longer 
>>>>> stored in the non-rx netdev).
>>>>> 
>>>>> The comments indicate that the exception is made for tap interfaces since 
>>>>> writing to a tap interface with an AF_PACKET socket results in receiving 
>>>>> the packet you just wrote.  However, I don't think this behavior is 
>>>>> limited to taps.  Since the old version of the code sent and received 
>>>>> with the same socket descriptor, I think the loop was fixed by the check 
>>>>> in dev_queue_xmit_nit() in net/core/dev.c.  Since they're two different 
>>>>> socket descriptors now, this no longer works and you get the loop.
>>>> 
>>>> Ahh, it turns out Ben explained this to me when I ran into a related
>>>> issue with the FreeBSD userspace implementation.  Ben's message in the
>>>> thread is at http://openvswitch.org/pipermail/dev/2012-July/018806.html
>>>> .
>>>> 
>>>>> I fixed it (I think) by adding a BPF packet filter on the rx socket so 
>>>>> that it only receives incoming packets.  There's probably a better fix, 
>>>>> but you're welcome to the patch if you want it.
>>>> 
>>>> I think it's worth taking a look.
>>> 
>>> 
>>> I think this is more or less against master.
>>> 
>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
>>> index d73115b..cc47a6b 100644
>>> --- a/lib/netdev-linux.c
>>> +++ b/lib/netdev-linux.c
>>> @@ -744,6 +744,14 @@ netdev_linux_rx_open(struct netdev *netdev_, struct 
>>> netdev_rx **rxp)
>>>     } else {
>>>         struct sockaddr_ll sll;
>>>         int ifindex;
>>> +        /* Result of tcpdump -dd inbound */
>>> +        static struct sock_filter filt[] = {
>>> +            { 0x28, 0, 0, 0xfffff004 },
>>> +            { 0x15, 0, 1, 0x00000004 },
>>> +            { 0x6, 0, 0, 0x00000000 },
>>> +            { 0x6, 0, 0, 0x0000ffff }
>>> +        };
>>> +        static struct sock_fprog fprog = {ARRAY_SIZE(filt), filt};
>>> 
>>>         /* Create file descriptor. */
>>>         fd = socket(PF_PACKET, SOCK_RAW, 0);
>>> @@ -776,6 +784,16 @@ netdev_linux_rx_open(struct netdev *netdev_, struct 
>>> netdev_rx **rxp)
>>>                      netdev_get_name(netdev_), strerror(error));
>>>             goto error;
>>>         }
>>> +
>>> +        /* Filter for only incoming packets. */
>>> +        error = setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, &fprog,
>>> +                           sizeof fprog);
>>> +        if (error) {
>>> +            error = errno;
>>> +            VLOG_ERR("%s: failed attach filter (%s)",
>>> +                     netdev_get_name(netdev_), strerror(error));
>>> +            goto error;
>>> +        }
>>>     }
>>> 
>>>     rx = xmalloc(sizeof *rx);
>> 
>> Thanks, Murphy.
>> 
>> Simon, does this solve the problem you see?
> 
> Hi Ben, Hi all,
> 
> I have lightly tested this using the scenarios that I mentioned
> in my initial post - multicast IPv6 and IPv6, broadcast ARP - and
> yes, this seems to resolve the problem that I observed.
> 
> Tested-by: Simon Horman <[email protected]>


If you actually want this, I've submitted a slightly tweaked version:
http://openvswitch.org/pipermail/dev/2013-June/028439.html

_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] Possible Regression in 'netdev: Add new "struct netdev_rx" for capturing packets from a netdev.'

Reply via email to