On Oct 25, 2013, at 3:31 PM, Jesse Gross wrote:

> On Fri, Oct 25, 2013 at 10:49 AM, Romain Lenglet <rleng...@vmware.com> wrote:
>> On Oct 24, 2013, at 5:46 PM, Jesse Gross <je...@nicira.com> wrote:
>> 
>>> On Thu, Oct 24, 2013 at 3:39 PM, Romain Lenglet <rleng...@vmware.com> wrote:
>>>> ----- Original Message -----
>>>>> From: "Jesse Gross" <je...@nicira.com>
>>>>> To: "Romain Lenglet" <rleng...@vmware.com>
>>>>> Cc: dev@openvswitch.org
>>>>> Sent: Tuesday, October 22, 2013 3:46:54 PM
>>>>> Subject: Re: [ovs-dev] [PATCH] sFlow export: include standard tunnel 
>>>>> structures (for GRE, VXLAN etc.)
>>>>> 
>>>>> On Mon, Oct 21, 2013 at 2:33 PM, Romain Lenglet <rleng...@vmware.com> 
>>>>> wrote:
>>>>>> ----- Original Message -----
>>>>>>> From: "Romain Lenglet" <rleng...@vmware.com>
>>>>>>> To: "Jesse Gross" <je...@nicira.com>
>>>>>>> Cc: dev@openvswitch.org
>>>>>>> Sent: Friday, October 18, 2013 6:46:05 PM
>>>>>>> Subject: Re: [ovs-dev] [PATCH] sFlow export: include standard tunnel
>>>>>>> structures (for GRE, VXLAN etc.)
>>>>>>> 
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Jesse Gross" <je...@nicira.com>
>>>>>>>> To: "Romain Lenglet" <rleng...@vmware.com>
>>>>>>>> Cc: "Neil Mckee" <neil.mc...@inmon.com>, dev@openvswitch.org
>>>>>>>> Sent: Friday, October 18, 2013 6:23:23 PM
>>>>>>>> Subject: Re: [ovs-dev] [PATCH] sFlow export: include standard tunnel
>>>>>>>> structures (for GRE, VXLAN etc.)
>>>>>>>> 
>>>>>>>> On Fri, Oct 18, 2013 at 5:58 PM, Romain Lenglet <rleng...@vmware.com>
>>>>>>>> wrote:
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> From: "Jesse Gross" <je...@nicira.com>
>>>>>>>>>> To: "Romain Lenglet" <rleng...@vmware.com>
>>>>>>>>>> Cc: "Neil Mckee" <neil.mc...@inmon.com>, dev@openvswitch.org
>>>>>>>>>> Sent: Friday, October 18, 2013 5:50:05 PM
>>>>>>>>>> Subject: Re: [ovs-dev] [PATCH] sFlow export: include standard tunnel
>>>>>>>>>> structures (for GRE, VXLAN etc.)
>>>>>>>>>> 
>>>>>>>>>> On Fri, Oct 18, 2013 at 5:43 PM, Romain Lenglet <rleng...@vmware.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>> From: "Romain Lenglet" <rleng...@vmware.com>
>>>>>>>>>>>> To: "Neil Mckee" <neil.mc...@inmon.com>
>>>>>>>>>>>> Cc: dev@openvswitch.org
>>>>>>>>>>>> Sent: Wednesday, October 9, 2013 10:30:17 AM
>>>>>>>>>>>> Subject: Re: [ovs-dev] [PATCH] sFlow export: include standard
>>>>>>>>>>>> tunnel
>>>>>>>>>>>> structures (for GRE, VXLAN etc.)
>>>>>>>>>>>> 
>>>>>>>>>>>> On Oct 8, 2013, at 10:09 PM, Neil Mckee <neil.mc...@inmon.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> +    /* Indicate 0==unknown for the src_port. It may be set to a
>>>>>>>>>>>>> random
>>>>>>>>>>>>> +       number on a flow-by-flow basis to increase entropy for
>>>>>>>>>>>>> ECMP
>>>>>>>>>>>>> fabrics.
>>>>>>>>>>>>> +       The assumption being made here is that it is not so
>>>>>>>>>>>>> important
>>>>>>>>>>>>> to
>>>>>>>>>>>>> +       report this.  At least not important enough to justify
>>>>>>>>>>>>> the
>>>>>>>>>>>>> effort
>>>>>>>>>>>>> +       of making it accessible here. */
>>>>>>>>>>>> 
>>>>>>>>>>>> Exporting the source UDP source port is essential.
>>>>>>>>>>>> You also have to export the tunnel key: GRE key (32- or 64-bit),
>>>>>>>>>>>> VNI
>>>>>>>>>>>> (24-bit), etc.
>>>>>>>>>>>> I don't see how this feature could be useful without the UDP
>>>>>>>>>>>> source
>>>>>>>>>>>> port
>>>>>>>>>>>> and
>>>>>>>>>>>> tunnel key.
>>>>>>>>>>> 
>>>>>>>>>>> I thought more about this. Exporting the source UDP port is really
>>>>>>>>>>> important. Since the source port is calculated in the tunnel port
>>>>>>>>>>> at
>>>>>>>>>>> egress during encapsulation and is lost at ingress during
>>>>>>>>>>> decapsulation,
>>>>>>>>>>> and the sampling here is done before encapsulation or after
>>>>>>>>>>> decapsulation,
>>>>>>>>>>> the easiest way I can imagine to determine the source port is to
>>>>>>>>>>> redo
>>>>>>>>>>> the
>>>>>>>>>>> hashing here. This would require factorizing the hashing code into
>>>>>>>>>>> a
>>>>>>>>>>> function that can be used in userspace in this code.
>>>>>>>>>> 
>>>>>>>>>> I don't think that it's really viable to regenerate the hash used to
>>>>>>>>>> compute the source port. In the best case, we are the ones generating
>>>>>>>>>> it but the kernel hash function might change or the hash might come
>>>>>>>>>> from the NIC. In the worst case, when we receive a packet the hash
>>>>>>>>>> could have been generated by a non-OVS device with an unknown hash
>>>>>>>>>> algorithm
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Yes, agreed, that's a problem.
>>>>>>>>> The only other alternative I can imagine to get the source UDP port is
>>>>>>>>> to
>>>>>>>>> do
>>>>>>>>> the sampling in the port (esp. in the tunnel port) in the datapath.
>>>>>>>>> This would be quite intrusive and complicated, as it would require the
>>>>>>>>> ports
>>>>>>>>> to do sampling and upcalls.
>>>>>>>>> I'd prefer to avoid that.
>>>>>>>>> Do you see any other alternative?
>>>>>>>> 
>>>>>>>> I guess it's not entirely clear to me at this point why it's important
>>>>>>>> to record the UDP source port. Can you explain?
>>>>>>> 
>>>>>>> Identifying all the flows for a tunnel in the network is useful to 
>>>>>>> detect
>>>>>>> changes in the routing of tunnel flows, which can e.g. be due to network
>>>>>>> failures (e.g. a link went down, and the flows are rerouted), and might
>>>>>>> impact the tunnel as a whole. This is useful for root cause analysis.
>>>>>>> If we didn't get all the tunnel flow headers from the hosts, we would 
>>>>>>> lose
>>>>>>> some of the information.
>>>>>> 
>>>>>> More importantly, we want to be able to map a logical flow to a specific
>>>>>> tunnel flow (i.e. the tunnel's IP+transport header), to determine the 
>>>>>> path
>>>>>> taken by a logical flow in the physical fabric.
>>>>>> This is possible because the tunnel header, incl. the transport source
>>>>>> port,
>>>>>> uniquely identifies that tunnel flow in the physical network.
>>>>>> If we don't have the source port from OVS, we can't do that mapping.
>>>>>> 
>>>>>> Here's a proposal:
>>>>>> 
>>>>>> - Factorize get_src_port() out of datapath/vport-lisp.c to be shared by 
>>>>>> all
>>>>>> vport types.
>>>>>> 
>>>>>> - Modify datapath/vport-vxlan.c to call get_src_port() instead of
>>>>>> vxlan_src_port(). The VXLAN RFC doesn't specify any specific hashing
>>>>>> algorithm, so it should be fine to just use the same get_src_port()
>>>>>> hashing as for LISP.
>>>>>> 
>>>>>> - Always calculate the hash in kernelspace for each packet sent in an
>>>>>> upcall, or only for some types of upcalls e.g. sFlow / IPFIX sampling
>>>>>> upcalls, and send it in the upcall so that userspace gets the transport
>>>>>> transport source port from independently from the input or output tunnel
>>>>>> type.
>>>>> 
>>>>> To clarify, I think this would need to have two parts:
>>>>> - For received packets include the source port of the outer UDP
>>>>> header. This can't simply be computed because the original sender
>>>>> might have used an unknown hash algorithm.
>>>>> - Compute the hash for all packets because they might be send to a tunnel
>>>>> port.
>>>>> 
>>>>> Is that right?
>>>> 
>>>> Correct.
>>>> 
>>>>> The second one in particular seems a little odd to me. The other thing
>>>>> that I think is important to be careful of is how this will interact
>>>>> with megaflows. In the traditional OVS case with a very wide exact
>>>>> match, it was likely (although perhaps not guaranteed) that the hash
>>>>> computed for the source port was fixed for a given flow. This is
>>>>> definitely not true any longer and while it may not matter if it is
>>>>> only needed on a per-sampled-packet basis, it affects where and how it
>>>>> is attached to a flow or upcall
>>>> 
>>>> I would insert sample actions just before each output action, so whatever
>>>> flow is sampled is exactly the same flow that will be output. So if it's
>>>> calculated in the datapath for both the sample upcall and the output, the
>>>> hashing should be done on the exact same flow?
>>> 
>>> I think it's OK in the sampling case because, as you say, it's based
>>> on a particular packet. The part that is potentially a little odd is
>>> that we typically use the same flow format for all types of upcalls so
>>> we would either have to strip it out in other cases or find some
>>> reasonable semantics.
>>> 
>>>> Would it make any difference to sample just after an output, instead of
>>>> just before? It doesn't matter either way from a sampling viewpoint.
>>> 
>>> I guess if we can find a way to make this work at the physical layer
>>> (when the packet goes through OVS after encapsulation) that seems
>>> best. The upcall would have the entire packet and could dissect it
>>> arbitrarily deeply. I realize that this has issues with IPsec but
>>> maybe this is an edge case or we can mark the packet somehow before
>>> encryption to get the necessarily information?
>> 
>> I agree that the vport-*.c modules would be the best place to do the 
>> sampling,
>> because we have all the information we need there.
>> However, that would require:
>> - Adding hooks into all vport-*.c modules for packet sampling.
>> - Defining new upcalls for sending packets sampled at ingress / egress within
>>  a port.
>> - Defining new configuration options for ports (of all types), to 
>> enable/disable
>>  sampling, setting the sampling probability, etc., enable/disable sampling at
>>  ingress and/or egress, enable/disable sampling of tunnel headers, etc.
>> 
>> It seems like more work to me, and it's more intrusive.
>> I had hoped we could implement this with only minimal changes to the current
>> sample datapath action and upcalls.
>> But if you think it's worth instrumenting the datapath vports, I'll think 
>> more about
>> that solution.
> 
> I actually didn't mean to include sampling in the vports themselves -
> I agree that it seems very intrusive. We just spent a lot of time
> separating out the tunnel dataplanes from OVS itself, so I really
> don't want to mix them again.
> 
> A packet typically passes through OVS twice when doing tunneling -
> once before encapsulation and once after encapsulation. What I was
> talking about doing was using the existing sampling action to grab
> packets after they have been encapsulated. From a layering perspective
> this seems best and it essentially models what you would see if you
> were sampling on a transit switch so it might match up well with that.

You'd have to be careful with non-unicast packets.  Their sampling
probability must still be the same.

> 
> The problem with this approach though, is that it requires being able
> to dissect the encapsulated packet. The difficulty of this ranges from
> relatively easy (GRE, although we'd need to have some parsing code) to
> moderately difficult (VXLAN, since we need to know what UDP port to
> look at) to essentially impossible (IPsec). One possible solution to
> this is to try to link the flows from before and after sampling,
> although that could be expensive from a performance perspective. It
> might be possible to get around that by choosing the packets that we
> want to sample before encapsulation, store the flow data, mark them,
> and then export all marked packets after encapsulation. It solves the
> kernel layering problem but could be complicated for userspace.

I've seen schemes like that work like that before (e.g. for memcached
and haproxy transaction-sampling) .  It can be helpful to make the
sampling decision early and just set a bit.   Then the packet/transaction
can know to store fields that it will need when it eventually writes
out the sample.

How do you guys feel about the patch as it stands now,  though?  It
adds very useful tunnel and LAG information with minimal impact.  It
also completely removes a private hash-table that the sFlow module
was maintaining before.   It's OK for some fields to be missing,  and
those missing fields can be addressed over time, giving priority to high
value, low code impact extensions. 

Neil


> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to