Re: [ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload
On 08/23/14 at 09:53pm, Jamal Hadi Salim wrote: > On 08/22/14 18:53, Scott Feldman wrote: > > Ok, Scott - now i have looked at the patches on the plane and i am > still not convinced ;-> > > >The intent is to use openvswitch.ko’s struct sw_flow to program hardware via > >the > >ndo_swdev_flow_* ops, but otherwise be independent of OVS. So the upper > >layer of > >the driver is struct sw_flow and any module above the driver can construct a > >struct > >sw_flow and push it down via ndo_swdev_flow_*. So your non-OVS use-case > >should be > >handled. OVS is another use-case. struct sw_flow should not be OVS-aware, > >but > >rather a generic flow match/action sufficient to offload the data plane to > >HW. > > > There is a legitimate case to be made for offloading OVS but *not* > a basis for making it the offload interface. > My suggestion is to make all OVS stuff a separate patchset. > This thing needs to stand alone without OVS and we dont need > to confuse the two. I get what you are saying but I don't see that to be the case here. I don't see how this series proposes the OVS case as *the* interface. It proposes *a* interface which in this case is flow based with mask support to accomodate the typical ntuple filter API in HW. OVS happens to be one of the easiest to use examples as a consumer because it already provides a flat flow representation. That said, I already mentioned that I see a lot of value in having a non OVS API example ASAP and I will be glad to help out John to achieve that. > Having said that: > I believe in starting simple - by solving the basic functions of > L2/3 offload first because those are well understood and fundamental. > There is the simplicity of those network functions and then > need to deal with tons of quarks that surround them > I think getting that right will help in understanding the issues and > make this interface better. This is where i am going to focus my effort. I thought this is exactly what is happening here. The flow key/mask based API as proposed focuses on basic forwarding for L2-L4. > Here's my view on flows in the patchset: > What we need is ability to specify different types of classifiers. > But leave L2 and 3 out of that - that should be part of the basic > feature set. > > Your 15-tuple classifier should be one of those classifiers. > This is because you *cannot possibly* have a universal classifier. > The tc classifier/action API has got this part right. There is > no ONE flow classifier but rather it has flexibility to add as many > as you want. Exactly and I never saw Jiri claim that swdev_flow_insert() would be the only offload capability exposed by the API. I see no reason why it could not also provide swdev_offset_match_insert() or swdev_ebpf_insert() for the 2*next generation HW. I don't think it makes sense to focus entirely on finding a single common denominator and channel everything through a single function to represent all the different generic and less generic offload capabilities. I believe that doing so will raise the minimal HW requirements barrier HW too much. I think we should start somewhere, learn and evolve. > IOW: > I should be able to specify a classifier that matches the > definition of the openflow thing you are using. But then i should also > be able to create one based on 32 bit value/masks, one that classifies > strings, one that classifies metadata, my own pigeon observer > classifier etc. And be able to attach them in combinations > to select different things within the packet and act differently. So essentially what you are saying is that the tc interface (in particular cls and act) could be used as an API to achieve offloads. Yes! I thought this was very clear and a given. I don't think that it makes sense to force every offload API consumer through the tc interface though. This comes back to my statements in a previous email. I don't think we should require that all the offload decision complexity *has* to live in the kernel. Quagga, nft, or OVS should be given an API to influence this more directly (with the hardware complexity properly abstracted). In-kernel users such as bridge, l3 (especially rules), and tc itself could be handled through a cls/act derived API internally. > Lets pick an example of the u32 classifier (or i could pick nftables). > Using your scheme i have to incur penalties to translating u32 to your > classifier and only achieve basic functionality; and now in addition > i cant do 90% of my u32 features. And u32 is very implementable > in hardware. I don't fully understand the last claim. Given the specific ntuple capabilities of a lot of hardware out there (let's assume a typical 5-tuple capability with N capacity for exact matches and M capacity for wildcard matches) supporting a generic u32 offset-len-mask is not exactly trivial at all and I don't see how you can get around converting the generic offset into a ntuple filter *at some point* to verify if the HW can fullfil th
Re: [ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload
On 08/23/14 at 10:09am, John Fastabend wrote: > Right. I think this is basically what Jiri and I discussed when he > originally posted the series. For my use cases this is one of the > more interesting pieces. If no one else is looking at it I can try > it on some of the already existing open source drivers that have some > very simple support for ingress flow tables read flow director. Awesome. I'm definitely very interested in helping out on this part as well. > Thanks. This is exactly what I was trying to hint at and why the > optimization can not be done in the driver. The driver shouldn't > have to know about the cost models of SW vs HW rules or how to > break up rules into sets of complimentary hw/sw rules. That's an excellent summary of what I wanted to say. > the other thing I've been thinking about is how to handle hardware > with multiple flow tables. We could let the driver handle this > but if I ever want to employ a new optimization strategy then I > need to rewrite the driver. To me this looks a lot like policy > which should not be driven by the kernel. We can probably ignore > this case for the moment until we get some of the other things > addressed. Agreed, this sounds like something to handle a bit later. It is potentially very interesting as it would allow to offload at least partial pipelines but it obviously adds a new dimension to the API. I strongly feel that the API as proposed could be extended in this direction though. It will require a notion of tables for swdev_flow_insert() and we'll likely need an API to set default table policies although that is likely even needed for single table support. We might also have to introduce a concept of bundles at some point to provide atomic updates across multiple tables for consistency. > IMO I think extending the API is the easiest route but the best > way to resolve this is to try and write the code. I'll take a > stab at it next week. I'm absolutely interested in writing code for this as well. If we can find consensus on merging at least the core API bits in some form then that would allow for more people to get involved. Maybe we can skip the OVS bits in the first merge and continue that work in a separate git tree. I'm also definitely very interested in hearing Pravin's and Jesse's thoughts on the overall API ;-) John's flow director API replacement idea can definitely serve as an excellent first in-tree consumer as it looks even simpler. > by the way Jiri I think the patches are a great start. +1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [patch net-next RFC 03/12] net: introduce generic switch devices support
On 08/21/14 at 06:18pm, Jiri Pirko wrote: > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 39294b9..8b5d14c 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -49,6 +49,8 @@ > > #include > #include > +#include > + > #include > > struct netpoll_info; > @@ -997,6 +999,24 @@ typedef u16 (*select_queue_fallback_t)(struct net_device > *dev, > + * int (*ndo_swdev_flow_insert)(struct net_device *dev, > + * const struct sw_flow *flow); > + * Called to insert a flow into switch device. If driver does > + * not implement this, it is assumed that the hw does not have > + * a capability to work with flows. I asume you are planning to add an additional expandable struct paramter to handle insertion parameters when the first is introduced to avoid requiring to touch every driver every time. > +/** > + * swdev_flow_insert - Insert a flow into switch > + * @dev: port device > + * @flow: flow descriptor > + * > + * Insert a flow into switch this port is part of. > + */ > +int swdev_flow_insert(struct net_device *dev, const struct sw_flow *flow) > +{ > + const struct net_device_ops *ops = dev->netdev_ops; > + > + print_flow(flow, dev, "insert"); > + if (!ops->ndo_swdev_flow_insert) > + return -EOPNOTSUPP; > + WARN_ON(!ops->ndo_swdev_get_id); > + BUG_ON(!flow->actions); > + return ops->ndo_swdev_flow_insert(dev, flow); > +} > +EXPORT_SYMBOL(swdev_flow_insert); Splitting the flow specific API into a separate file (maybe swdev_flow.c?) might help resolve some of the concerns around the focus on flows. It would make it clear that it's one of multiple models to be supported. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload
On 08/24/14 07:12, Thomas Graf wrote: On 08/23/14 at 09:53pm, Jamal Hadi Salim wrote: I get what you are saying but I don't see that to be the case here. I don't see how this series proposes the OVS case as *the* interface. The focus of the patches is on offloading flows (uses the ovs or shall i say the broadcom OF-DPA API, which is one vendor's view of the world). Yes, people are going to deploy more hardware which knows how to do a lot of flows (but today that is in the tiny tiny minority) I would have liked to see more focus on L2/3 as a first step because they are more predominantly deployed than anything with flows. And they are well understood from a functional perspective. Then that would bring to the front API issues since you have a large sample space of deployments and we can refactor as needed. i.e The hard part is dealing with 10 different chips which have a slightly different meaning of (example) how to do L3 in their implementation. I dont see such a focus in these patches because they start with a premise "the world is about flows". It proposes *a* interface which in this case is flow based with mask support to accomodate the typical ntuple filter API in HW. OVS happens to be one of the easiest to use examples as a consumer because it already provides a flat flow representation. In other words, there is a direct 1-1 map between this approach and OVS. That is a contentious point. I thought this is exactly what is happening here. The flow key/mask based API as proposed focuses on basic forwarding for L2-L4. Not at all. I gave an example earlier with u32, but lets pick the other extreme of well understood functions, say L3 (I could pick L2 as well). This openflow api tries to describe different header fields in the packet. That is not the challenge for such an API. The challenge is dealing with the quarks. Some chips implement FIB and NH conjoined; others implement them separately. I dont see how this is even being remotely touched on. Exactly and I never saw Jiri claim that swdev_flow_insert() would be the only offload capability exposed by the API. I see no reason why it could not also provide swdev_offset_match_insert() or swdev_ebpf_insert() for the 2*next generation HW. I don't think it makes sense to focus entirely on finding a single common denominator and channel everything through a single function to represent all the different generic and less generic offload capabilities. I believe that doing so will raise the minimal HW requirements barrier HW too much. I think we should start somewhere, learn and evolve. You are asking me to go and add a new ndo() every time i have a new network function? That is not scalable. I have no problem with the approach that was posted - I have a problem that it is it focused on flows (and is lacking ability to specify different classifiers). It should not be called xxx_flow_xxx So essentially what you are saying is that the tc interface (in particular cls and act) could be used as an API to achieve offloads. I am pointing to it as an example of something that is *done right* in terms of not picking a universal classifier. Something the current OVS posted/used api lacks (and to be frank OF never cared about because it had a niche use case; lets not make that niche use case the centre of gravity). Yes! I thought this was very clear and a given. I don't think that it makes sense to force every offload API consumer through the tc interface though. If you looked at all my presentations I have never laid such claim but i have always said I want everything described in iproute2 to work. I dont think anyone disagreed. I dont expect tc to be used as *the interface*; but on the same token i dont expect OVS to be used as *the interface*. Lets start with hardware abstraction. Lets map to existing Linux APIs and then see where some massaging maybe needed. This comes back to my statements in a previous email. I don't think we should require that all the offload decision complexity *has* to live in the kernel. Agreed. Move policy decisions out of the kernel for one but also any complex acrobatics as well that are use case specific. Quagga, nft, or OVS should be given an API to influence this more directly (with the hardware complexity properly abstracted). In-kernel users such as bridge, l3 (especially rules), and tc itself could be handled through a cls/act derived API internally. This abstraction gives OVS 1-1 mapping which is something i object to. You want to penalize me for the sake of getting the OVS api in place? Beginning with flows and laying claim to that one would be able to cover everything is non-starter. Lets pick an example of the u32 classifier (or i could pick nftables). Using your scheme i have to incur penalties to translating u32 to your classifier and only achieve basic functionality; and now in addition i cant do 90% of my u32 features. And u32 is very implementable in hardware. I don't fully unde
Re: [ovs-dev] [PATCH] datapath: remove flow member from struct ovs_skb_cb
On Thu, Aug 21, 2014 at 12:25 PM, Lorand Jakab wrote: > struct ovs_skb_cb is full on kernels < 3.11 due to compatibility code. > This patch removes the 'flow' member in order to make room for data > needed by layer 3 flow/port support that will be added in an upcoming > patch. The 'flow' memeber was chosen for removal because it's only used > in ovs_execute_actions(). > > Signed-off-by: Lorand Jakab I'll let Pravin review this since he has been working more actively in this area. Pravin, there was some discussion of this in the L3 thread. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4 3/3] datapath: add layer 3 flow/port support
On Thu, Aug 21, 2014 at 12:24 PM, Lori Jakab wrote: > On 8/6/14 4:37 AM, Jesse Gross wrote: >> Besides the fact that it would be nice to unify things, I'm not sure >> that it is actually correct to pull off VLAN, MPLS, etc. tags that we >> don't understand. Presumably, these are being used to create some kind >> of isolation and dropping all the tags that we don't understand would >> just blindly merge things together. Also, from an implementation >> perspective, we don't really parse beyond what we understand (this is >> not quite true for MPLS but it is for VLANs and any unknown protocols) >> so we wouldn't necessarily actually get to an L3 header. On the other >> hand, if we only deal with tags that we do understand then can't we >> have more atomic instructions that pull them off one-by-one? > > > Atomic instructions make a lot of sense, and we definitely should support > them. However, if a packet is sent to an L3 port, it should not have any > Ethernet, VLAN or MPLS header. Note that for now these actions are not user > visible, and cannot be added with OpenFlow, they are added automatically by > the user space code, when a packet is sent from an L2->L3 port or L3->L2 > port. > > I agreed with Ben that this behavior will be changed in the future, based on > discussions in the ONF's EXT-112 proposal, where the consensus was to have > explicit push_eth and pop_eth OpenFlow actions. However, the current > behavior of the LISP code is to add/remove L2 headers automatically, so we > want to keep that in the first phase, until I implement OpenFlow support. > Once automatic header popping is not used, we can remove the > "extract_l3_packet()" action. > > From the implementation perspective, this action would rely on the > skb->network_header being set correctly. I'm not sure that this really addresses the concerns that I have from both policy and implementation perspectives. >From the policy perspective: Will a Cisco implementation of LISP accept a packet on an unknown VLAN, throw away the tag when it does L2 processing, and then continue on to do L3 processing and LISP? Somehow I doubt it. >From an implementation perspective: There is no mechanism to rely on skb->network_header being set in the general case. It will only be set in situations where OVS can parse the packet. OVS doesn't know about QinQ for example so you the action that you have defined won't actually give you an L3 packet. Finally, from a Linux perspective, removing kernel interfaces isn't something that is OK as a matter of policy. The fact that the OVS kernel module doesn't know how to inherently find the L3 is not as bad as it seems. Conveniently, it means that we can pop off exactly the same headers as we know how to parse so there is no loss of functionality to doing this individually. As bonus, since userspace needs to generate these actions, the policy can be enforced there as well. >>> --- a/datapath/actions.c >>> +++ b/datapath/actions.c >>> @@ -258,6 +258,7 @@ static void push_eth(struct sk_buff *skb, const >>> struct >>> ovs_action_push_eth *ethh >>>skb_push(skb, ETH_HLEN); >>>skb_reset_mac_header(skb); >>>skb_reset_mac_len(skb); >>> + vlan_set_tci(skb, 0); >> >> I don't think it's right to throw away the vlan tag. My assumption is >> that it should be placed into the packet before we push on the >> Ethernet header (this actually should never happen since we disallow >> pushing multiple Ethernet headers but it still seems like the right >> thing to do). > > > That's not what I would expect from a simple push_eth() action. If > someone > wanted to push a VLAN tag, that should be an explicit action. So I > would > just throw away whatever there is in skb->vlan_tci. The contents of skb->vlan_tci were already pushed explicitly (look at the implementation of push_vlan()). From a user's perspective, there is no difference - it is just an internal optimization to pull the tag out of the packet, so I can't see any justification for throwing away that tag. >>> >>> >>> OK. I will remove that line. >> >> But just to be clear, I think we still need to explicitly push the >> vlan tag on the packet. Otherwise, it will migrate from the inner >> Ethernet header to the outer. > > > Still not sure what I need to do here. If skb->vlan_tci was set by a > push_vlan action, then the vlan tag should already have been pushed on the > packet, right? Yes, the user has requested that a vlan tag be applied. skb->vlan_tci is a form of offloading that should be transparent and will no longer work in this case, so it needs to be de-offloaded. Please take a look at the OVS vlan operations and other vlan code in the kernel if this isn't clear. > BTW, for a packet with multiple Ethernet headers, is there already > support > for matching on anything after the outer header? Does t
Re: [ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload
On Aug 24, 2014, at 8:15 AM, Jamal Hadi Salim wrote: > On 08/24/14 07:12, Thomas Graf wrote: >> On 08/23/14 at 09:53pm, Jamal Hadi Salim wrote: > >> >> I get what you are saying but I don't see that to be the case here. I >> don't see how this series proposes the OVS case as *the* interface. > > The focus of the patches is on offloading flows (uses the > ovs or shall i say the broadcom OF-DPA API, which is one > vendor's view of the world). > > Yes, people are going to deploy more hardware which knows how to do > a lot of flows (but today that is in the tiny tiny minority) > > I would have liked to see more focus on L2/3 as a first step because > they are more predominantly deployed than anything with flows. And > they are well understood from a functional perspective. > Then that would bring to the front API issues since you have > a large sample space of deployments and we can refactor as needed. With respect to focus on L2/L3, I have a pretty *good* hunch someone could write a kernel module that gleans from the L2/L3 netlink echoes already flying around and translates to sw_flows and in turn into ndo_swdev_flow_* calls. So the existing linux bonds and bridges and vlans and ROUTEs and NEIGHs and LINKs and ADDRs work as normal, unchanged, with iproute2 still originating the netlink msgs from the user. The new kernel module (let’s call it “dagger” after the white spy from spy vs. spy MAD comic) can figure out what forwarding gets offloaded to HW just from the netlink echoes. If someone wrote a dagger module in parallel with the other efforts being discussed here, I think we’d have a pretty good idea what the API needs to look like, at least to cover existing L2/L3 world we're all familiar with. Gleaning netlink msgs isn’t ideal for several reasons (and probably making more than a few in the audience squeamish), but it would be a quick way to get us closer to the answer we’re seeking which is the swdev driver model. -scott ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/4] ofproto-dpif-xlate: Only learn MAC addresses upon handling packets.
On 23 August 2014 10:47, Ben Pfaff wrote: > MAC learning, like flow entry learning via the "learn" action, should only > happen if a packet was actually received. > > Signed-off-by: Ben Pfaff Acked-by: Joe Stringer ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload
On 08/24/2014 07:24 PM, Scott Feldman wrote: On Aug 24, 2014, at 8:15 AM, Jamal Hadi Salim wrote: On 08/24/14 07:12, Thomas Graf wrote: On 08/23/14 at 09:53pm, Jamal Hadi Salim wrote: I get what you are saying but I don't see that to be the case here. I don't see how this series proposes the OVS case as *the* interface. The focus of the patches is on offloading flows (uses the ovs or shall i say the broadcom OF-DPA API, which is one vendor's view of the world). Yes, people are going to deploy more hardware which knows how to do a lot of flows (but today that is in the tiny tiny minority) I would have liked to see more focus on L2/3 as a first step because they are more predominantly deployed than anything with flows. And they are well understood from a functional perspective. Then that would bring to the front API issues since you have a large sample space of deployments and we can refactor as needed. With respect to focus on L2/L3, I have a pretty *good* hunch someone could write a kernel module that gleans from the L2/L3 netlink echoes already flying around and translates to sw_flows and in turn into ndo_swdev_flow_* calls. So the existing linux bonds and bridges and vlans and ROUTEs and NEIGHs and LINKs and ADDRs work as normal, unchanged, with iproute2 still originating the netlink msgs from the user. The new kernel module (let’s call it “dagger” after the white spy from spy vs. spy MAD comic) can figure out what forwarding gets offloaded to HW just from the netlink echoes. If someone wrote a dagger module in parallel with the other efforts being discussed here, I think we’d have a pretty good idea what the API needs to look like, at least to cover existing L2/L3 world we're all familiar with. Gleaning netlink msgs isn’t ideal for several reasons (and probably making more than a few in the audience squeamish), but it would be a quick way to get us closer to the answer we’re seeking which is the swdev driver model. In the L2 case we already have the fdb_add and fdb_del semantics that are being used today by NICs with embedded switches. And we have a DSA patch we could dig out of patchwork for those drivers. So I think it makes more sense to use the explicit interface rather than put another shim layer in the kernel. Its simpler and more to the point IMO. I suspect the resulting code will be smaller and easier to read. I'm the squemish one in the audience here. .John -- John Fastabend Intel Corporation ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/4] ofproto-dpif-xlate: Skip pushing stats if there are no packets to push.
On 23 August 2014 10:47, Ben Pfaff wrote: > xlate_push_stats() mostly does nothing if 'stats->n_packets' is 0. This > commit allows it to skip more complicated internal logic in that case. > > The one case I see in which xlate_push_stats() does do something if > 'stats->n_packets' is 0 is in the case of a cached fin_timeout translation. > That translation changes the idle and hard timeouts of a flow if tcp_flags > has FIN or RST set, even if n_packets is 0. But I don't think that can > happen anyway; how would FIN or RST be set without receiving a packet? > > Signed-off-by: Ben Pfaff > That reasoning makes sense to me. More broadly, the xlate_push_stats handles two functions - pushing stats and implementing side-effects from actions. If there are no packets, the stats shouldn't need updating. If there were no packets, then no packets could have hit actions for things such as mac learning/openflow learning. So I'd expect it to be a no-op. Acked-by: Joe Stringer ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/4] ofproto-dpif-xlate: Drop 'may_learn' parameter from xlate_push_stats().
On 23 August 2014 10:47, Ben Pfaff wrote: > Both existing callers calculated 'may_learn' as 'stats->n_packets > 0', so > it was redundant. Because xlate_push_stats() is now entirely a no-op if > 'stats->n_packets' is 0, we can now delete the tests entirely from the > cases that previously only ran if 'may_learn'. > > Signed-off-by: Ben Pfaff > Acked-by: Joe Stringer ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 4/4] ofproto-dpif-upcall: Attribute statistics to OpenFlow flows more precisely.
On 23 August 2014 10:47, Ben Pfaff wrote: > OpenFlow packet and byte counters have always been something of an > approximation in Open vSwitch. First, they are updated only periodically. > Second, they can be misattributed because statistics collection does a > retranslation and gives the statistics to whichever OpenFlow flow or flows > they happen to hit, which could be different from the original set of > flows because the OpenFlow flow table may have changed. > > This commit is intended to somewhat improve on the second point, by always > attributing statistics to the set of flows that were cached at the datapath > flow's initial translation by the revalidator. This reduces the race in > which the flow table could have changed to the interval between the initial > installation and the first translation by the revalidator. We plan to > start having the thread that does the initial installation also create the > ukey, which would eliminate that interval entirely. At that point, packets > could get lost from OpenFlow statistics because of the periodic collection > of datapath statistics, but they should no longer be misattributed. > > Signed-off-by: Ben Pfaff > We previously merged a similar patch: http://openvswitch.org/pipermail/dev/2014-June/041120.html ..which caused issues with xlate side-effects for mac-learning, so we reverted it: http://openvswitch.org/pipermail/dev/2014-June/041868.html Around that time, I also proposed an alternative, which failed to gain traction: http://openvswitch.org/pipermail/dev/2014-June/041718.html In summary, I agree with the reasoning around statistics. The trouble is that xlate_push_stats() also performs various side-effects that xlate_push_stats(). If this xlate_cache is out of date, and refers to rules that cause learning, then OVS may learn rules when it is not supposed to. These may cause unexpected behaviour, and the incorrect rules can hang around for an extended period. I think that the decision from the third patch above was that the current behaviour was "good enough", and we didn't have a good solution short of tracking flowtable transitions to implement retroactive side-effects (Excerpt from thread below) > I think that the correct solution is to perform "retroactive side-effects", > that is, perform side effects as though they had happened at the time that > the flow was hit. Anything that may hide these effects (like mac table > flush) would need to be taken into account. However, such information is > unavailable, and would be non-trivial to collect and maintain. Happy to revisit this discussion if you'd like. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC 02/14] dpif: Zero flow before dpif_flow_get().
On 23 August 2014 06:29, Ben Pfaff wrote: > On Thu, Aug 21, 2014 at 05:41:57PM +1200, Joe Stringer wrote: > > Signed-off-by: Joe Stringer > > My reading of the code is that this is redundant because > dpif_operate() always fills in the flow. > Right, this is needed in a later patch. I'll look at why it was needed and fold it into that patch. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC 03/14] revalidator: Use 'cmap' for storing ukeys.
On 23 August 2014 06:47, Ben Pfaff wrote: > On Thu, Aug 21, 2014 at 05:41:58PM +1200, Joe Stringer wrote: > > Signed-off-by: Joe Stringer > > As of this patch, there isn't an advantage to using a cmap yet, right? > Not yet, no. Cmaps become useful when access is spread across handler threads as well as revalidators. > > Here, I'd be inclined to add the ovsrcu_quiesce() call outside the > 'if' statement, unconditionally, because otherwise it's just a matter > of luck whether the number of ops happened to be a multiple of > REVALIDATE_MAX_BATCH: > > @@ -1582,6 +1587,7 @@ revalidate(struct revalidator *revalidator) > > > > if (n_ops) { > > push_dump_ops__(udpif, ops, n_ops); > > +ovsrcu_quiesce(); > > } > > } > > dpif_flow_dump_thread_destroy(dump_thread); > I'll fold this in. > Acked-by: Ben Pfaff > Thanks, I'll hold onto this patch until I recieve further review. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC 04/14] revalidator: Protect ukeys with a mutex.
On 23 August 2014 08:01, Ben Pfaff wrote: > On Thu, Aug 21, 2014 at 05:41:59PM +1200, Joe Stringer wrote: > > Currently, ukeys are protected during revalidator_sweep__() as only one > > thread accesses the ukey at a time. This is ensured using barriers: > > all revalidators will be in the GC phase, so they will only access their > > own ukey collection. > > > > A future patch will change the access patterns to allow these ukey > > collections to be read or modified while a revalidator is garbage > > collecting it. To protect the ukeys, this patch adds locking on the ukey > > collection. > > > > Signed-off-by: Joe Stringer > > Do we actually explain anywhere that "ukey" means "udpif_key"? > The definition of 'struct udpif_key' refers to the ukeys map in the parent udpif. I can update the commit log to make this more clear. > Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC 05/14] udpif: Separate ukey maps from revalidators.
On 23 August 2014 07:59, Ben Pfaff wrote: > n_umaps is now a hard-coded constant, but a lot of the code assumes > that it can vary. It would be very slightly simpler to actually use a > macro or enum to define the fixed size instead of a variable. It > would also allow better code generation in ukey_lookup() and > ukey_acquire() since hash % 128 is just hash & 0x7f, which ought to be > much cheaper than a general-purpose modulo operation. I suggest > making the change for that latter reason. > > upcall_unixctl_show() assert-fails if n_revalidators > 64, but nothing > in the code, that I see, limits the number of revalidators to that > number. We should limit the revalidators or just dynamically allocate > 'elements'. Actually, here's a simpler solution: get rid of > 'elements' and have the loop that read it construct each element as it > goes using a loop like the one in revalidator_sweep__(). > > Acked-by: Ben Pfaff > Thanks for the review, I'll make these changes for v2. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev