On Mon, Mar 18, 2013 at 01:49:43PM -0700, Jesse Gross wrote: > On Fri, Mar 15, 2013 at 7:27 AM, Simon Horman <ho...@verge.net.au> wrote: > > Recirculation is a technique to allow a frame to re-enter > > frame processing. This is intended to be used after actions > > have been applied to the frame with modify the frame in > > some way that makes it possible for richer processing to occur. > > > > An example is and indeed targeted use case is MPLS. If an MPLS frame has an > > mpls_pop action applied with the IPv4 ethernet type then it becomes > > possible to decode the IPv4 portion of the frame. This may be used to > > construct a facet that modifies the IPv4 portion of the frame. This is not > > possible prior to the mpls_pop action as the contents of the frame after > > the MPLS stack is not known to be IPv4. > > > > Design: > > * New recirculation action. > > > > ovs-vswitchd adds a recirculation action to the end of a list of > > datapath actions for a flow when the actions are truncated because > > insufficient flow match information is available to add the next > > OpenFlow action. The recirculation action includes an id which can be > > used to scope a facet lookup of a recirculated packet. > > > > e.g. pop_mpls(0x0800),dec_ttl becomes pop_mpls(0x800),resubmit(id=1) > > > > ovs-vswtichd then restarts processing of the packet, with truncated > > actions. > > > > When the actions do not contain a recirculate action processing stops > > and the mangled packet is executed. > > > > If the packet is being processed by ovs-vswitchd as a miss and the miss > > is being handled with facets (the usual case) then facets for each of > > recirculation steps are added to the datapath. > > > > For a packet that is not recirculated this will be a single facet. For > > a packet that is recirculated once this will be two facets. And so on, > > with the number of facets increasing by one for each recirculation. > > > > * New recirculation_id match attribute > > > > Facets of recirculated packets have a recirculation_id attribute added to > > the match key, the id is that of the recirculation action that > > caused the packet to be recirculated. This is used to scope the lookup > > of facets. > > > > * Datapath behaviour > > > > Then the datapath encounters a recirculate action it: > > + Recalculates the flow key based on the packet > > which will typically have been modified by previous actions > > + Adds a recirculation_id attribute to the match key whose id > > is that of the recirculate action > > + Performs a lookup using the new match key > > + Processes the packet if a facet matches the key or; > > + Makes an upcall if necessary > > > > TODO: > > > > * More sensible handling of recirculation IDs [ovs-vswtichd] > > * More sensible lookup of facets based on recirculation IDs [ovs-vswtichd] > > * Look more closely at facet revalidation [ovs-vswtichd] > > > > Signed-off-by: Simon Horman <ho...@verge.net.au> > > I'm having trouble applying this (even after applying the earlier > patches in the series). I'm not sure what changed but would you mind > sending an updated version?
Sure, will do. In the mean time you can see the code applied here: https://github.com/horms/openvswitch/tree/devel/mpls-recirculate.rfc1 > This is a somewhat short review, since I'm just looking at the diff. > > > diff --git a/datapath/actions.c b/datapath/actions.c > > index 60fa71a..b75f479 100644 > > --- a/datapath/actions.c > > +++ b/datapath/actions.c > > @@ -634,6 +636,12 @@ static int do_execute_actions(struct datapath *dp, > > struct sk_buff *skb, > > case OVS_ACTION_ATTR_SAMPLE: > > err = sample(dp, skb, a, tun_key); > > break; > > + > > + case OVS_ACTION_ATTR_RECIRCULATE: > > + if (recirculation_id) { > > + *recirculation_id = nla_get_u32(a); > > + } > > OVS_CB seems like the perfect place to store recirculation_id, that > way we don't have to pass pointers around everywhere. Sure, that sounds reasonable. > However, do we actually want to tie this directly to recirculation as > opposed to as a separate action? I see possible benefits of > separating it out: I'm not really sure that I understand. Could you explain how you see this working? > * It makes it more generic - we could potentially use skb_mark > directly or worst case introduce a new metadata field if we are > worried about conflicting uses (although we could always set the real > skb mark on the last pass). Either way there is a better potential > for reuse. I'm a bit wary of using skb_mark, what if its desirable to use it for something else? > * It seems likely that if you were doing more than two passes of > recirculation then you would still only want to use a single ID so > it's not really desirable to remark on every pass. > > > @@ -686,23 +695,28 @@ int ovs_execute_actions(struct datapath *dp, struct > > sk_buff *skb) > > if (unlikely(++loop->count > MAX_LOOPS)) > > loop->looping = true; > > if (unlikely(loop->looping)) { > > - error = loop_suppress(dp, acts); > > - kfree_skb(skb); > > + printk("%s: looping", __func__); > > Isn't this logging already handled by loop_suppress()? Sorry, that was just for debugging purposes. I was having a bit of trouble with this loop not long before posting the patch. > I think the memory management of the skb is somewhat inconsistent - > the caller retains ownership in the case of recirculation and some > errors but the callee takes it for normal output actions. What if the > callee takes it in all cases (as today) and gives it back as a return > value for recirculation? Ok, I'll see about making that so. > > diff --git a/datapath/datapath.c b/datapath/datapath.c > > index af0dba8..ac2fb0d 100644 > > --- a/datapath/datapath.c > > +++ b/datapath/datapath.c > > /* Must be called with rcu_read_lock. */ > > void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb) > [...] > > -out: > > - /* Update datapath statistics. */ > > - u64_stats_update_begin(&stats->sync); > > - (*stats_counter)++; > > - u64_stats_update_end(&stats->sync); > > + if (likely(!recirculate)) > > + break; > > + if (unlikely(!limit--)) { > > + kfree_skb(skb); > > + return; > > We should probably log a rate-limited warning in this case. Sure, can do. > > + } > > + OVS_CB(skb)->flow = NULL; > > I think the check for having OVS_CB(skb)->flow already set is dead > code at this point and we don't need to special case it. Ok, I'll prepare a preparatory patch to remove the dead code and build on top of that. It should simplify the code a bit :) > > @@ -905,10 +929,16 @@ static int ovs_packet_cmd_execute(struct sk_buff > > *skb, struct genl_info *info) > > goto err_unlock; > > > > local_bh_disable(); > > - err = ovs_execute_actions(dp, packet); > > + err = ovs_execute_actions(dp, packet, NULL); > > local_bh_enable(); > > rcu_read_unlock(); > > > > + if (err > 0) { > > + /* Recirculation is invalid on packet execute */ > > + err = -EINVAL; > > + goto err_flow_free; > > + } > > Isn't this going to add a lot of complication to userspace? It's > clearly possible for userspace to not use recirculation since it has > the full packet but it's basically going to require a separate > processing path just to handle this. I think that the path needs to be present in order to handle cases where facets aren't present. Such misses handled without facets and packet out handling. > > diff --git a/datapath/flow.h b/datapath/flow.h > > index 27e394b..f665d3a 100644 > > --- a/datapath/flow.h > > +++ b/datapath/flow.h > > struct sw_flow_key { > > struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel key. */ > > + __be32 recirculation_id; /* Identification of parent facet > > used > > + * progress of recirculation of a > > + * packet. Zero if the flow is not > > + * the result of a recirculation > > + * access. */ > > I don't think that there's any reason to make this networking byte > order, since it will never go out on the wire. I'll switch it around to host byte order. > > diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h > > index f7b788c..24d1eb5 100644 > > --- a/include/linux/openvswitch.h > > +++ b/include/linux/openvswitch.h > > @@ -280,6 +280,7 @@ enum ovs_key_attr { > > OVS_KEY_ATTR_ND, /* struct ovs_key_nd */ > > OVS_KEY_ATTR_SKB_MARK, /* u32 skb mark */ > > OVS_KEY_ATTR_TUNNEL, /* Nested set of ovs_tunnel attributes */ > > + OVS_KEY_ATTR_RECIRCULATION_ID, /* u32 recirculation id */ > > > > #ifdef __KERNEL__ > > OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */ > > @@ -513,10 +514,13 @@ struct ovs_action_push_vlan { > > * indicate the new packet contents This could potentially still be > > * %ETH_P_MPLS_* if the resulting MPLS label stack is not empty. If there > > * is no MPLS label stack, as determined by ethertype, no action is taken. > > + * @OVS_ACTION_ATTR_RECIRCULATE: Restart processing of packet. > > + * The packet must have been modified by a previous action in such a way > > + * that it does not match its original facet again. > > I think we should avoid use of the word 'facet' in the interface or in > kernel code since it is a specific userspace concept which could > change over time. Ok, I'll use the word flow instead. > > diff --git a/lib/match.c b/lib/match.c > > index 2aa4e89..63626a3 100644 > > --- a/lib/match.c > > +++ b/lib/match.c > > @@ -849,7 +849,7 @@ match_format(const struct match *match, struct ds *s, > > unsigned int priority) > > > > int i; > > > > - BUILD_ASSERT_DECL(FLOW_WC_SEQ == 20); > > + BUILD_ASSERT_DECL(FLOW_WC_SEQ == 21); > > > > if (priority != OFP_DEFAULT_PRIORITY) { > > ds_put_format(s, "priority=%u,", priority); > > I think we need to be able to print out any new fields that we add to > struct flow for debugging purposes, even if they aren't exposed to the > user normally. Sure, I'll add recirculation here. > > diff --git a/lib/odp-util.c b/lib/odp-util.c > > index 0ded70d..7fd7a3f 100644 > > --- a/lib/odp-util.c > > +++ b/lib/odp-util.c > > @@ -1546,6 +1558,11 @@ odp_flow_key_from_flow(struct ofpbuf *buf, const > > struct flow *flow, > > nl_msg_put_u32(buf, OVS_KEY_ATTR_SKB_MARK, flow->skb_mark); > > } > > > > + if (flow->recirculation_id) { > > + nl_msg_put_u32(buf, OVS_KEY_ATTR_RECIRCULATION_ID, > > + flow->recirculation_id); > > + } > > I think we should be careful about making zero a magic value. It's > fine to do that in userspace as part of our implementation but I > wouldn't want to carry that assumption across interface boundaries. Sure, I'll clean that up somehow. > > commit_set_nw_action(const struct flow *flow, struct flow *base, > > struct ofpbuf *odp_actions) > > { > > + //VLOG_WARN("%s: %u 0x%04x", __func__, flow->nw_proto, > > ntohs(flow->dl_type)); > > Can we remove this debugging code? Yes. > > @@ -2419,7 +2450,8 @@ commit_set_skb_mark_action(const struct flow *flow, > > struct flow *base, > > * 'base' and 'flow', appends ODP actions to 'odp_actions' that change the > > flow > > * key from 'base' into 'flow', and then changes 'base' the same way. > > Does not > > * commit set_tunnel actions. Users should call commit_odp_tunnel_action() > > - * in addition to this function if needed. */ > > + * and commit_recirculate_action() in addition to those functions are > > needed. > > I think the actual name of the function is > commit_odp_recirculate_action (otherwise it makes it difficult to > search for it). Thanks, I'll fix that. > > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c > > index 9fe0aaa..0329e3c 100644 > > --- a/ofproto/ofproto-dpif.c > > +++ b/ofproto/ofproto-dpif.c > > I stopped here since the code is complex enough that I need to be able > to see it in context. > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev