Re: [ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload

2014-08-24 Thread Thomas Graf
On 08/23/14 at 09:53pm, Jamal Hadi Salim wrote:
> On 08/22/14 18:53, Scott Feldman wrote:
> 
> Ok, Scott - now i have looked at the patches on the plane and i am
> still not convinced ;->
> 
> >The intent is to use openvswitch.ko’s struct sw_flow to program hardware via 
> >the
> >ndo_swdev_flow_* ops, but otherwise be independent of OVS.  So the upper 
> >layer of
> >the driver is struct sw_flow and any module above the driver can construct a 
> >struct
> >sw_flow and push it down via ndo_swdev_flow_*.  So your non-OVS use-case 
> >should be
> >handled.  OVS is another use-case.  struct sw_flow should not be OVS-aware, 
> >but
> >rather a generic flow match/action sufficient to offload the data plane to 
> >HW.
> 
> 
> There is a legitimate case to be made for offloading OVS but *not*
> a basis for making it the offload interface.
> My suggestion is to make all OVS stuff a separate patchset.
> This thing needs to stand alone without OVS and we dont need
> to confuse the two.

I get what you are saying but I don't see that to be the case here. I
don't see how this series proposes the OVS case as *the* interface.
It proposes *a* interface which in this case is flow based with mask
support to accomodate the typical ntuple filter API in HW. OVS happens
to be one of the easiest to use examples as a consumer because it
already provides a flat flow representation.

That said, I already mentioned that I see a lot of value in having a
non OVS API example ASAP and I will be glad to help out John to achieve
that.

> Having said that:
> I believe in starting simple - by solving the basic functions of
> L2/3 offload first because those are well understood and fundamental.
> There is the simplicity of those network functions and then
> need to deal with tons of quarks that surround them
> I think getting that right will help in understanding the issues and
> make this interface better. This is where i am going to focus my effort.

I thought this is exactly what is happening here. The flow key/mask
based API as proposed focuses on basic forwarding for L2-L4.

> Here's my view on flows in the patchset:
> What we need is ability to specify different types of classifiers.
> But leave L2 and 3 out of that - that should be part of the basic
> feature set.
>
> Your 15-tuple classifier should be one of those classifiers.
> This is because you *cannot possibly* have a universal classifier.
> The tc classifier/action API has got this part right. There is
> no ONE flow classifier but rather it has flexibility to add as many
> as you want.

Exactly and I never saw Jiri claim that swdev_flow_insert() would be
the only offload capability exposed by the API. I see no reason why
it could not also provide swdev_offset_match_insert() or
swdev_ebpf_insert() for the 2*next generation HW. I don't think it
makes sense to focus entirely on finding a single common denominator
and channel everything through a single function to represent all the
different generic and less generic offload capabilities. I believe
that doing so will raise the minimal HW requirements barrier HW too
much. I think we should start somewhere, learn and evolve.

> IOW:
> I should be able to specify a classifier that matches the
> definition of the openflow thing you are using. But then i should also
> be able to create one based on 32 bit value/masks, one that classifies
> strings, one that classifies metadata, my own pigeon observer
> classifier etc. And be able to attach them in combinations
> to select different things within the packet and act differently.

So essentially what you are saying is that the tc interface
(in particular cls and act) could be used as an API to achieve offloads.
Yes! I thought this was very clear and a given. I don't think that it
makes sense to force every offload API consumer through the tc interface
though. This comes back to my statements in a previous email. I don't
think we should require that all the offload decision complexity *has*
to live in the kernel. Quagga, nft, or OVS should be given an API to
influence this more directly (with the hardware complexity properly
abstracted). In-kernel users such as bridge, l3 (especially rules),
and tc itself could be handled through a cls/act derived API internally.

> Lets pick an example of the u32 classifier (or i could pick nftables).
> Using your scheme i have to incur penalties to translating u32 to your
> classifier and only achieve basic functionality; and now in addition
> i cant do 90% of my u32 features. And u32 is very implementable
> in hardware.

I don't fully understand the last claim. Given the specific ntuple
capabilities of a lot of hardware out there (let's assume a typical
5-tuple capability with N capacity for exact matches and M capacity for
wildcard matches) supporting a generic u32 offset-len-mask is not exactly
trivial at all and I don't see how you can get around converting the
generic offset into a ntuple filter *at some point* to verify if the HW
can fullfil th

Re: [ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload

2014-08-24 Thread Thomas Graf
On 08/23/14 at 10:09am, John Fastabend wrote:
> Right. I think this is basically what Jiri and I discussed when he
> originally posted the series. For my use cases this is one of the
> more interesting pieces. If no one else is looking at it I can try
> it on some of the already existing open source drivers that have some
> very simple support for ingress flow tables read flow director.

Awesome. I'm definitely very interested in helping out on this part
as well.

> Thanks. This is exactly what I was trying to hint at and why the
> optimization can not be done in the driver. The driver shouldn't
> have to know about the cost models of SW vs HW rules or how to
> break up rules into sets of complimentary hw/sw rules.

That's an excellent summary of what I wanted to say.

> the other thing I've been thinking about is how to handle hardware
> with multiple flow tables. We could let the driver handle this
> but if I ever want to employ a new optimization strategy then I
> need to rewrite the driver. To me this looks a lot like policy
> which should not be driven by the kernel. We can probably ignore
> this case for the moment until we get some of the other things
> addressed.

Agreed, this sounds like something to handle a bit later.
It is potentially very interesting as it would allow to offload at
least partial pipelines but it obviously adds a new dimension to
the API. I strongly feel that the API as proposed could be extended
in this direction though. It will require a notion of tables for
swdev_flow_insert() and we'll likely need an API to set default
table policies although that is likely even needed for single table
support. We might also have to introduce a concept of bundles at
some point to provide atomic updates across multiple tables for
consistency.

> IMO I think extending the API is the easiest route but the best
> way to resolve this is to try and write the code. I'll take a
> stab at it next week.

I'm absolutely interested in writing code for this as well. If we
can find consensus on merging at least the core API bits in some
form then that would allow for more people to get involved. Maybe
we can skip the OVS bits in the first merge and continue that work
in a separate git tree. I'm also definitely very interested in hearing
Pravin's and Jesse's thoughts on the overall API ;-)

John's flow director API replacement idea can definitely serve as an
excellent first in-tree consumer as it looks even simpler.

> by the way Jiri I think the patches are a great start.

+1
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next RFC 03/12] net: introduce generic switch devices support

2014-08-24 Thread Thomas Graf
On 08/21/14 at 06:18pm, Jiri Pirko wrote:
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 39294b9..8b5d14c 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -49,6 +49,8 @@
>  
>  #include 
>  #include 
> +#include 
> +
>  #include 
>  
>  struct netpoll_info;
> @@ -997,6 +999,24 @@ typedef u16 (*select_queue_fallback_t)(struct net_device 
> *dev,
> + * int (*ndo_swdev_flow_insert)(struct net_device *dev,
> + *   const struct sw_flow *flow);
> + *   Called to insert a flow into switch device. If driver does
> + *   not implement this, it is assumed that the hw does not have
> + *   a capability to work with flows.

I asume you are planning to add an additional expandable struct
paramter to handle insertion parameters when the first is introduced
to avoid requiring to touch every driver every time.

> +/**
> + *   swdev_flow_insert - Insert a flow into switch
> + *   @dev: port device
> + *   @flow: flow descriptor
> + *
> + *   Insert a flow into switch this port is part of.
> + */
> +int swdev_flow_insert(struct net_device *dev, const struct sw_flow *flow)
> +{
> + const struct net_device_ops *ops = dev->netdev_ops;
> +
> + print_flow(flow, dev, "insert");
> + if (!ops->ndo_swdev_flow_insert)
> + return -EOPNOTSUPP;
> + WARN_ON(!ops->ndo_swdev_get_id);
> + BUG_ON(!flow->actions);
> + return ops->ndo_swdev_flow_insert(dev, flow);
> +}
> +EXPORT_SYMBOL(swdev_flow_insert);

Splitting the flow specific API into a separate file (maybe
swdev_flow.c?) might help resolve some of the concerns around the
focus on flows. It would make it clear that it's one of multiple
models to be supported.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload

2014-08-24 Thread Jamal Hadi Salim

On 08/24/14 07:12, Thomas Graf wrote:

On 08/23/14 at 09:53pm, Jamal Hadi Salim wrote:




I get what you are saying but I don't see that to be the case here. I
don't see how this series proposes the OVS case as *the* interface.


The focus of the patches is on offloading flows (uses the
ovs or shall i say the broadcom OF-DPA API, which is one
vendor's view of the world).

Yes, people are going to deploy more hardware which knows how to do
a lot of flows (but today that is in the tiny tiny minority)

I would have liked to see more focus on L2/3 as a first step because
they are more predominantly deployed than anything with flows. And
they are well understood from a functional perspective.
Then that would bring to the front API issues since you have
a large sample space of deployments and we can refactor as needed.
i.e
The hard part is dealing with 10 different chips which have a slightly
different meaning of (example) how to do L3 in their implementation.
I dont see such a focus in these patches because they start with a
premise "the world is about flows".


It proposes *a* interface which in this case is flow based with mask
support to accomodate the typical ntuple filter API in HW. OVS happens
to be one of the easiest to use examples as a consumer because it
already provides a flat flow representation.



In other words, there is a direct 1-1 map between this approach and OVS.
That is a contentious point.


I thought this is exactly what is happening here. The flow key/mask
based API as proposed focuses on basic forwarding for L2-L4.



Not at all.
I gave an example earlier with u32, but lets pick the other extreme
of well understood functions, say L3 (I could pick L2 as well).
This openflow api tries to describe different header
fields in the packet. That is not the challenge for such an
API. The challenge is dealing with the quarks.
Some chips implement FIB and NH conjoined; others implement
them separately.
I dont see how this is even being remotely touched on.




Exactly and I never saw Jiri claim that swdev_flow_insert() would be
the only offload capability exposed by the API. I see no reason why
it could not also provide swdev_offset_match_insert() or
swdev_ebpf_insert() for the 2*next generation HW. I don't think it
makes sense to focus entirely on finding a single common denominator
and channel everything through a single function to represent all the
different generic and less generic offload capabilities. I believe
that doing so will raise the minimal HW requirements barrier HW too
much. I think we should start somewhere, learn and evolve.



You are asking me to go and add a new ndo() every time i have a new 
network function? That is not scalable. I have no problem with

the approach that was posted - I have a problem that it is it
focused on flows (and is lacking ability to specify different
classifiers). It should not be called xxx_flow_xxx



So essentially what you are saying is that the tc interface
(in particular cls and act) could be used as an API to achieve offloads.


I am pointing to it as an example of something that is *done right* in
terms of not picking a universal classifier. Something the current
OVS posted/used api lacks (and to be frank OF never cared about because
it had a niche use case; lets not make that niche use case the centre
of gravity).


Yes! I thought this was very clear and a given. I don't think that it
makes sense to force every offload API consumer through the tc interface
though.


If you looked at all my presentations I have never laid such
claim but i have always said I want everything described in
iproute2 to work. I dont think anyone disagreed.
I dont expect tc to be used as *the interface*; but on the same
token i dont expect OVS to be used as *the interface*.
Lets start with hardware abstraction. Lets map to existing Linux APIs
and then see where some massaging maybe needed.


This comes back to my statements in a previous email. I don't
think we should require that all the offload decision complexity *has*
to live in the kernel.


Agreed. Move policy decisions out of the kernel for one but also
any complex acrobatics as well that are use case specific.


Quagga, nft, or OVS should be given an API to
influence this more directly (with the hardware complexity properly
abstracted). In-kernel users such as bridge, l3 (especially rules),
and tc itself could be handled through a cls/act derived API internally.



This abstraction gives OVS 1-1 mapping which is something i object to.
You want to penalize me for the sake of getting the OVS api in place?
Beginning with flows and laying claim to that one would be able to
cover everything is non-starter.


Lets pick an example of the u32 classifier (or i could pick nftables).
Using your scheme i have to incur penalties to translating u32 to your
classifier and only achieve basic functionality; and now in addition
i cant do 90% of my u32 features. And u32 is very implementable
in hardware.


I don't fully unde

Re: [ovs-dev] [PATCH] datapath: remove flow member from struct ovs_skb_cb

2014-08-24 Thread Jesse Gross
On Thu, Aug 21, 2014 at 12:25 PM, Lorand Jakab  wrote:
> struct ovs_skb_cb is full on kernels < 3.11 due to compatibility code.
> This patch removes the 'flow' member in order to make room for data
> needed by layer 3 flow/port support that will be added in an upcoming
> patch.  The 'flow' memeber was chosen for removal because it's only used
> in ovs_execute_actions().
>
> Signed-off-by: Lorand Jakab 

I'll let Pravin review this since he has been working more actively in
this area.

Pravin, there was some discussion of this in the L3 thread.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v4 3/3] datapath: add layer 3 flow/port support

2014-08-24 Thread Jesse Gross
On Thu, Aug 21, 2014 at 12:24 PM, Lori Jakab  wrote:
> On 8/6/14 4:37 AM, Jesse Gross wrote:
>> Besides the fact that it would be nice to unify things, I'm not sure
>> that it is actually correct to pull off VLAN, MPLS, etc. tags that we
>> don't understand. Presumably, these are being used to create some kind
>> of isolation and dropping all the tags that we don't understand would
>> just blindly merge things together. Also, from an implementation
>> perspective, we don't really parse beyond what we understand (this is
>> not quite true for MPLS but it is for VLANs and any unknown protocols)
>> so we wouldn't necessarily actually get to an L3 header. On the other
>> hand, if we only deal with tags that we do understand then can't we
>> have more atomic instructions that pull them off one-by-one?
>
>
> Atomic instructions make a lot of sense, and we definitely should support
> them.  However, if a packet is sent to an L3 port, it should not have any
> Ethernet, VLAN or MPLS header. Note that for now these actions are not user
> visible, and cannot be added with OpenFlow, they are added automatically by
> the user space code, when a packet is sent from an L2->L3 port or L3->L2
> port.
>
> I agreed with Ben that this behavior will be changed in the future, based on
> discussions in the ONF's EXT-112 proposal, where the consensus was to have
> explicit push_eth and pop_eth OpenFlow actions.  However, the current
> behavior of the LISP code is to add/remove L2 headers automatically, so we
> want to keep that in the first phase, until I implement OpenFlow support.
> Once automatic header popping is not used, we can remove the
> "extract_l3_packet()" action.
>
> From the implementation perspective, this action would rely on the
> skb->network_header being set correctly.

I'm not sure that this really addresses the concerns that I have from
both policy and implementation perspectives.

>From the policy perspective: Will a Cisco implementation of LISP
accept a packet on an unknown VLAN, throw away the tag when it does L2
processing, and then continue on to do L3 processing and LISP? Somehow
I doubt it.

>From an implementation perspective: There is no mechanism to rely on
skb->network_header being set in the general case. It will only be set
in situations where OVS can parse the packet. OVS doesn't know about
QinQ for example so you the action that you have defined won't
actually give you an L3 packet.

Finally, from a Linux perspective, removing kernel interfaces isn't
something that is OK as a matter of policy.

The fact that the OVS kernel module doesn't know how to inherently
find the L3 is not as bad as it seems. Conveniently, it means that we
can pop off exactly the same headers as we know how to parse so there
is no loss of functionality to doing this individually. As bonus,
since userspace needs to generate these actions, the policy can be
enforced there as well.

>>> --- a/datapath/actions.c
>>> +++ b/datapath/actions.c
>>> @@ -258,6 +258,7 @@ static void push_eth(struct sk_buff *skb, const
>>> struct
>>> ovs_action_push_eth *ethh
>>>skb_push(skb, ETH_HLEN);
>>>skb_reset_mac_header(skb);
>>>skb_reset_mac_len(skb);
>>> +   vlan_set_tci(skb, 0);
>>
>> I don't think it's right to throw away the vlan tag. My assumption is
>> that it should be placed into the packet before we push on the
>> Ethernet header (this actually should never happen since we disallow
>> pushing multiple Ethernet headers but it still seems like the right
>> thing to do).
>
>
> That's not what I would expect from a simple push_eth() action.  If
> someone
> wanted to push a VLAN tag, that should be an explicit action.  So I
> would
> just throw away whatever there is in skb->vlan_tci.

 The contents of skb->vlan_tci were already pushed explicitly (look at
 the implementation of push_vlan()). From a user's perspective, there
 is no difference - it is just an internal optimization to pull the tag
 out of the packet, so I can't see any justification for throwing away
 that tag.
>>>
>>>
>>> OK.  I will remove that line.
>>
>> But just to be clear, I think we still need to explicitly push the
>> vlan tag on the packet. Otherwise, it will migrate from the inner
>> Ethernet header to the outer.
>
>
> Still not sure what I need to do here.  If skb->vlan_tci was set by a
> push_vlan action, then the vlan tag should already have been pushed on the
> packet, right?

Yes, the user has requested that a vlan tag be applied. skb->vlan_tci
is a form of offloading that should be transparent and will no longer
work in this case, so it needs to be de-offloaded. Please take a look
at the OVS vlan operations and other vlan code in the kernel if this
isn't clear.

> BTW, for a packet with multiple Ethernet headers, is there already
> support
> for matching on anything after the outer header?  Does t

Re: [ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload

2014-08-24 Thread Scott Feldman

On Aug 24, 2014, at 8:15 AM, Jamal Hadi Salim  wrote:

> On 08/24/14 07:12, Thomas Graf wrote:
>> On 08/23/14 at 09:53pm, Jamal Hadi Salim wrote:
> 
>> 
>> I get what you are saying but I don't see that to be the case here. I
>> don't see how this series proposes the OVS case as *the* interface.
> 
> The focus of the patches is on offloading flows (uses the
> ovs or shall i say the broadcom OF-DPA API, which is one
> vendor's view of the world).
> 
> Yes, people are going to deploy more hardware which knows how to do
> a lot of flows (but today that is in the tiny tiny minority)
> 
> I would have liked to see more focus on L2/3 as a first step because
> they are more predominantly deployed than anything with flows. And
> they are well understood from a functional perspective.
> Then that would bring to the front API issues since you have
> a large sample space of deployments and we can refactor as needed.

With respect to focus on L2/L3, I have a pretty *good* hunch someone could 
write a kernel module that gleans from the L2/L3 netlink echoes already flying 
around and translates to sw_flows and in turn into ndo_swdev_flow_* calls.  So 
the existing linux bonds and bridges and vlans and ROUTEs and NEIGHs and LINKs 
and ADDRs work as normal, unchanged, with iproute2 still originating the 
netlink msgs from the user.  The new kernel module (let’s call it “dagger” 
after the white spy from spy vs. spy MAD comic) can figure out what forwarding 
gets offloaded to HW just from the netlink echoes.  If someone wrote a dagger 
module in parallel with the other efforts being discussed here, I think we’d 
have a pretty good idea what the API needs to look like, at least to cover 
existing L2/L3 world we're all familiar with.  Gleaning netlink msgs isn’t 
ideal for several reasons (and probably making more than a few in the audience 
squeamish), but it would be a quick way to get us closer to the answer we’re 
seeking which is the swdev driver model.

-scott



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/4] ofproto-dpif-xlate: Only learn MAC addresses upon handling packets.

2014-08-24 Thread Joe Stringer
On 23 August 2014 10:47, Ben Pfaff  wrote:

> MAC learning, like flow entry learning via the "learn" action, should only
> happen if a packet was actually received.
>
> Signed-off-by: Ben Pfaff 


Acked-by: Joe Stringer 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload

2014-08-24 Thread John Fastabend

On 08/24/2014 07:24 PM, Scott Feldman wrote:


On Aug 24, 2014, at 8:15 AM, Jamal Hadi Salim  wrote:


On 08/24/14 07:12, Thomas Graf wrote:

On 08/23/14 at 09:53pm, Jamal Hadi Salim wrote:




I get what you are saying but I don't see that to be the case here. I
don't see how this series proposes the OVS case as *the* interface.


The focus of the patches is on offloading flows (uses the
ovs or shall i say the broadcom OF-DPA API, which is one
vendor's view of the world).

Yes, people are going to deploy more hardware which knows how to do
a lot of flows (but today that is in the tiny tiny minority)

I would have liked to see more focus on L2/3 as a first step because
they are more predominantly deployed than anything with flows. And
they are well understood from a functional perspective.
Then that would bring to the front API issues since you have
a large sample space of deployments and we can refactor as needed.


With respect to focus on L2/L3, I have a pretty *good* hunch someone

could write a kernel module that gleans from the L2/L3 netlink echoes
already flying around and translates to sw_flows and in turn into
ndo_swdev_flow_* calls. So the existing linux bonds and bridges and
vlans and ROUTEs and NEIGHs and LINKs and ADDRs work as normal,
unchanged, with iproute2 still originating the netlink msgs from the
user. The new kernel module (let’s call it “dagger” after the white spy
from spy vs. spy MAD comic) can figure out what forwarding gets
offloaded to HW just from the netlink echoes. If someone wrote a dagger
module in parallel with the other efforts being discussed here, I think
we’d have a pretty good idea what the API needs to look like, at least
to cover existing L2/L3 world we're all familiar with. Gleaning netlink
msgs isn’t ideal for several reasons (and probably making more than a
few in the audience squeamish), but it would be a quick way to get us
closer to the answer we’re seeking which is the swdev driver model.




In the L2 case we already have the fdb_add and fdb_del semantics that
are being used today by NICs with embedded switches. And we have a DSA
patch we could dig out of patchwork for those drivers.

So I think it makes more sense to use the explicit interface rather
than put another shim layer in the kernel. Its simpler and more to the
point IMO. I suspect the resulting code will be smaller and easier to
read. I'm the squemish one in the audience here.

.John


--
John Fastabend Intel Corporation
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/4] ofproto-dpif-xlate: Skip pushing stats if there are no packets to push.

2014-08-24 Thread Joe Stringer
On 23 August 2014 10:47, Ben Pfaff  wrote:

> xlate_push_stats() mostly does nothing if 'stats->n_packets' is 0.  This
> commit allows it to skip more complicated internal logic in that case.
>
> The one case I see in which xlate_push_stats() does do something if
> 'stats->n_packets' is 0 is in the case of a cached fin_timeout translation.
> That translation changes the idle and hard timeouts of a flow if tcp_flags
> has FIN or RST set, even if n_packets is 0.  But I don't think that can
> happen anyway; how would FIN or RST be set without receiving a packet?
>
> Signed-off-by: Ben Pfaff 
>

That reasoning makes sense to me. More broadly, the xlate_push_stats
handles two functions - pushing stats and implementing side-effects from
actions. If there are no packets, the stats shouldn't need updating. If
there were no packets, then no packets could have hit actions for things
such as mac learning/openflow learning. So I'd expect it to be a no-op.

Acked-by: Joe Stringer 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 3/4] ofproto-dpif-xlate: Drop 'may_learn' parameter from xlate_push_stats().

2014-08-24 Thread Joe Stringer
On 23 August 2014 10:47, Ben Pfaff  wrote:

> Both existing callers calculated 'may_learn' as 'stats->n_packets > 0', so
> it was redundant.  Because xlate_push_stats() is now entirely a no-op if
> 'stats->n_packets' is 0, we can now delete the tests entirely from the
> cases that previously only ran if 'may_learn'.
>
> Signed-off-by: Ben Pfaff 
>

Acked-by: Joe Stringer 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 4/4] ofproto-dpif-upcall: Attribute statistics to OpenFlow flows more precisely.

2014-08-24 Thread Joe Stringer
On 23 August 2014 10:47, Ben Pfaff  wrote:

> OpenFlow packet and byte counters have always been something of an
> approximation in Open vSwitch.  First, they are updated only periodically.
> Second, they can be misattributed because statistics collection does a
> retranslation and gives the statistics to whichever OpenFlow flow or flows
> they happen to hit, which could be different from the original set of
> flows because the OpenFlow flow table may have changed.
>
> This commit is intended to somewhat improve on the second point, by always
> attributing statistics to the set of flows that were cached at the datapath
> flow's initial translation by the revalidator.  This reduces the race in
> which the flow table could have changed to the interval between the initial
> installation and the first translation by the revalidator.  We plan to
> start having the thread that does the initial installation also create the
> ukey, which would eliminate that interval entirely.  At that point, packets
> could get lost from OpenFlow statistics because of the periodic collection
> of datapath statistics, but they should no longer be misattributed.
>
> Signed-off-by: Ben Pfaff 
>

We previously merged a similar patch:
http://openvswitch.org/pipermail/dev/2014-June/041120.html

..which caused issues with xlate side-effects for mac-learning, so we
reverted it:
http://openvswitch.org/pipermail/dev/2014-June/041868.html

Around that time, I also proposed an alternative, which failed to gain
traction:
http://openvswitch.org/pipermail/dev/2014-June/041718.html


In summary, I agree with the reasoning around statistics. The trouble is
that xlate_push_stats() also performs various side-effects that
xlate_push_stats(). If this xlate_cache is out of date, and refers to rules
that cause learning, then OVS may learn rules when it is not supposed to.
These may cause unexpected behaviour, and the incorrect rules can hang
around for an extended period.

I think that the decision from the third patch above was that the current
behaviour was "good enough", and we didn't have a good solution short of
tracking flowtable transitions to implement retroactive side-effects
(Excerpt from thread below)

> I think that the correct solution is to perform "retroactive
side-effects",
> that is, perform side effects as though they had happened at the time that
> the flow was hit. Anything that may hide these effects (like mac table
> flush) would need to be taken into account. However, such information is
> unavailable, and would be non-trivial to collect and maintain.

Happy to revisit this discussion if you'd like.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC 02/14] dpif: Zero flow before dpif_flow_get().

2014-08-24 Thread Joe Stringer
On 23 August 2014 06:29, Ben Pfaff  wrote:

> On Thu, Aug 21, 2014 at 05:41:57PM +1200, Joe Stringer wrote:
> > Signed-off-by: Joe Stringer 
>
> My reading of the code is that this is redundant because
> dpif_operate() always fills in the flow.
>

Right, this is needed in a later patch. I'll look at why it was needed and
fold it into that patch.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC 03/14] revalidator: Use 'cmap' for storing ukeys.

2014-08-24 Thread Joe Stringer
On 23 August 2014 06:47, Ben Pfaff  wrote:

> On Thu, Aug 21, 2014 at 05:41:58PM +1200, Joe Stringer wrote:
> > Signed-off-by: Joe Stringer 
>
> As of this patch, there isn't an advantage to using a cmap yet, right?
>

Not yet, no. Cmaps become useful when access is spread across handler
threads as well as revalidators.


>
> Here, I'd be inclined to add the ovsrcu_quiesce() call outside the
> 'if' statement, unconditionally, because otherwise it's just a matter
> of luck whether the number of ops happened to be a multiple of
> REVALIDATE_MAX_BATCH:
> > @@ -1582,6 +1587,7 @@ revalidate(struct revalidator *revalidator)
> >
> >  if (n_ops) {
> >  push_dump_ops__(udpif, ops, n_ops);
> > +ovsrcu_quiesce();
> >  }
> >  }
> >  dpif_flow_dump_thread_destroy(dump_thread);
>

I'll fold this in.



>  Acked-by: Ben Pfaff 
>

Thanks, I'll hold onto this patch until I recieve further review.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC 04/14] revalidator: Protect ukeys with a mutex.

2014-08-24 Thread Joe Stringer
On 23 August 2014 08:01, Ben Pfaff  wrote:

> On Thu, Aug 21, 2014 at 05:41:59PM +1200, Joe Stringer wrote:
> > Currently, ukeys are protected during revalidator_sweep__() as only one
> > thread accesses the ukey at a time. This is ensured using barriers:
> > all revalidators will be in the GC phase, so they will only access their
> > own ukey collection.
> >
> > A future patch will change the access patterns to allow these ukey
> > collections to be read or modified while a revalidator is garbage
> > collecting it. To protect the ukeys, this patch adds locking on the ukey
> > collection.
> >
> > Signed-off-by: Joe Stringer 
>
> Do we actually explain anywhere that "ukey" means "udpif_key"?
>

The definition of 'struct udpif_key' refers to the ukeys map in the parent
udpif. I can update the commit log to make this more clear.


> Acked-by: Ben Pfaff 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC 05/14] udpif: Separate ukey maps from revalidators.

2014-08-24 Thread Joe Stringer
On 23 August 2014 07:59, Ben Pfaff  wrote:

> n_umaps is now a hard-coded constant, but a lot of the code assumes
> that it can vary.  It would be very slightly simpler to actually use a
> macro or enum to define the fixed size instead of a variable.  It
> would also allow better code generation in ukey_lookup() and
> ukey_acquire() since hash % 128 is just hash & 0x7f, which ought to be
> much cheaper than a general-purpose modulo operation.  I suggest
> making the change for that latter reason.
>
> upcall_unixctl_show() assert-fails if n_revalidators > 64, but nothing
> in the code, that I see, limits the number of revalidators to that
> number.  We should limit the revalidators or just dynamically allocate
> 'elements'.  Actually, here's a simpler solution: get rid of
> 'elements' and have the loop that read it construct each element as it
> goes using a loop like the one in revalidator_sweep__().
>
> Acked-by: Ben Pfaff 
>

Thanks for the review, I'll make these changes for v2.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev