On Mon, Mar 18, 2013 at 01:49:43PM -0700, Jesse Gross wrote:
> On Fri, Mar 15, 2013 at 7:27 AM, Simon Horman <ho...@verge.net.au> wrote:
> > Recirculation is a technique to allow a frame to re-enter
> > frame processing. This is intended to be used after actions
> > have been applied to the frame with modify the frame in
> > some way that makes it possible for richer processing to occur.
> >
> > An example is and indeed targeted use case is MPLS. If an MPLS frame has an
> > mpls_pop action applied with the IPv4 ethernet type then it becomes
> > possible to decode the IPv4 portion of the frame. This may be used to
> > construct a facet that modifies the IPv4 portion of the frame. This is not
> > possible prior to the mpls_pop action as the contents of the frame after
> > the MPLS stack is not known to be IPv4.
> >
> > Design:
> > * New recirculation action.
> >
> >   ovs-vswitchd adds a recirculation action to the end of a list of
> >   datapath actions for a flow when the actions are truncated because
> >   insufficient flow match information is available to add the next
> >   OpenFlow action.  The recirculation action includes an id which can be
> >   used to scope a facet lookup of a recirculated packet.
> >
> >   e.g.  pop_mpls(0x0800),dec_ttl becomes pop_mpls(0x800),resubmit(id=1)
> >
> >   ovs-vswtichd then restarts processing of the packet, with truncated 
> > actions.
> >
> >   When the actions do not contain a recirculate action processing stops
> >   and the mangled packet is executed.
> >
> >   If the packet is being processed by ovs-vswitchd as a miss and the miss
> >   is being handled with facets (the usual case) then facets for each of
> >   recirculation steps are added to the datapath.
> >
> >   For a packet that is not recirculated this will be a single facet. For
> >   a packet that is recirculated once this will be two facets. And so on,
> >   with the number of facets increasing by one for each recirculation.
> >
> > * New recirculation_id match attribute
> >
> >   Facets of recirculated packets have a recirculation_id attribute added to
> >   the match key, the id is that of the recirculation action that
> >   caused the packet to be recirculated. This is used to scope the lookup
> >   of facets.
> >
> > * Datapath behaviour
> >
> >   Then the datapath encounters a recirculate action it:
> >   + Recalculates the flow key based on the packet
> >     which will typically have been modified by previous actions
> >   + Adds a recirculation_id attribute to the match key whose id
> >     is that of the recirculate action
> >   + Performs a lookup using the new match key
> >   + Processes the packet if a facet matches the key or;
> >   + Makes an upcall if necessary
> >
> > TODO:
> >
> > * More sensible handling of recirculation IDs [ovs-vswtichd]
> > * More sensible lookup of facets based on recirculation IDs [ovs-vswtichd]
> > * Look more closely at facet revalidation [ovs-vswtichd]
> >
> > Signed-off-by: Simon Horman <ho...@verge.net.au>
> 
> I'm having trouble applying this (even after applying the earlier
> patches in the series).  I'm not sure what changed but would you mind
> sending an updated version?

Sure, will do.

In the mean time you can see the code applied here:

https://github.com/horms/openvswitch/tree/devel/mpls-recirculate.rfc1


> This is a somewhat short review, since I'm just looking at the diff.
> 
> > diff --git a/datapath/actions.c b/datapath/actions.c
> > index 60fa71a..b75f479 100644
> > --- a/datapath/actions.c
> > +++ b/datapath/actions.c
> > @@ -634,6 +636,12 @@ static int do_execute_actions(struct datapath *dp, 
> > struct sk_buff *skb,
> >                 case OVS_ACTION_ATTR_SAMPLE:
> >                         err = sample(dp, skb, a, tun_key);
> >                         break;
> > +
> > +               case OVS_ACTION_ATTR_RECIRCULATE:
> > +                       if (recirculation_id) {
> > +                               *recirculation_id = nla_get_u32(a);
> > +                       }
> 
> OVS_CB seems like the perfect place to store recirculation_id, that
> way we don't have to pass pointers around everywhere.

Sure, that sounds reasonable.

> However, do we actually want to tie this directly to recirculation as
> opposed to as a separate action?  I see possible benefits of
> separating it out:

I'm not really sure that I understand. Could you explain
how you see this working?

>  * It makes it more generic - we could potentially use skb_mark
> directly or worst case introduce a new metadata field if we are
> worried about conflicting uses (although we could always set the real
> skb mark on the last pass).  Either way there is a better potential
> for reuse.

I'm a bit wary of using skb_mark, what if its desirable to
use it for something else?

>  * It seems likely that if you were doing more than two passes of
> recirculation then you would still only want to use a single ID so
> it's not really desirable to remark on every pass.
> 
> > @@ -686,23 +695,28 @@ int ovs_execute_actions(struct datapath *dp, struct 
> > sk_buff *skb)
> >         if (unlikely(++loop->count > MAX_LOOPS))
> >                 loop->looping = true;
> >         if (unlikely(loop->looping)) {
> > -               error = loop_suppress(dp, acts);
> > -               kfree_skb(skb);
> > +               printk("%s: looping", __func__);
> 
> Isn't this logging already handled by loop_suppress()?

Sorry, that was just for debugging purposes.
I was having a bit of trouble with this loop not long
before posting the patch.

> I think the memory management of the skb is somewhat inconsistent -
> the caller retains ownership in the case of recirculation and some
> errors but the callee takes it for normal output actions.  What if the
> callee takes it in all cases (as today) and gives it back as a return
> value for recirculation?

Ok, I'll see about making that so.

> > diff --git a/datapath/datapath.c b/datapath/datapath.c
> > index af0dba8..ac2fb0d 100644
> > --- a/datapath/datapath.c
> > +++ b/datapath/datapath.c
> >  /* Must be called with rcu_read_lock. */
> >  void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
> [...]
> > -out:
> > -       /* Update datapath statistics. */
> > -       u64_stats_update_begin(&stats->sync);
> > -       (*stats_counter)++;
> > -       u64_stats_update_end(&stats->sync);
> > +               if (likely(!recirculate))
> > +                       break;
> > +               if (unlikely(!limit--)) {
> > +                       kfree_skb(skb);
> > +                       return;
> 
> We should probably log a rate-limited warning in this case.

Sure, can do.

> > +               }
> > +               OVS_CB(skb)->flow = NULL;
> 
> I think the check for having OVS_CB(skb)->flow already set is dead
> code at this point and we don't need to special case it.

Ok, I'll prepare a preparatory patch to remove the dead code
and build on top of that. It should simplify the code a bit :)

> > @@ -905,10 +929,16 @@ static int ovs_packet_cmd_execute(struct sk_buff 
> > *skb, struct genl_info *info)
> >                 goto err_unlock;
> >
> >         local_bh_disable();
> > -       err = ovs_execute_actions(dp, packet);
> > +       err = ovs_execute_actions(dp, packet, NULL);
> >         local_bh_enable();
> >         rcu_read_unlock();
> >
> > +       if (err > 0) {
> > +               /* Recirculation is invalid on packet execute */
> > +               err = -EINVAL;
> > +               goto err_flow_free;
> > +       }
> 
> Isn't this going to add a lot of complication to userspace?  It's
> clearly possible for userspace to not use recirculation since it has
> the full packet but it's basically going to require a separate
> processing path just to handle this.

I think that the path needs to be present in order to handle
cases where facets aren't present. Such misses handled without facets
and packet out handling.

> > diff --git a/datapath/flow.h b/datapath/flow.h
> > index 27e394b..f665d3a 100644
> > --- a/datapath/flow.h
> > +++ b/datapath/flow.h
> >  struct sw_flow_key {
> >         struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
> > +       __be32  recirculation_id;       /* Identification of parent facet 
> > used
> > +                                        * progress of recirculation of a
> > +                                        * packet.  Zero if the flow is not
> > +                                        * the result of a recirculation
> > +                                        * access. */
> 
> I don't think that there's any reason to make this networking byte
> order, since it will never go out on the wire.

I'll switch it around to host byte order.

> > diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
> > index f7b788c..24d1eb5 100644
> > --- a/include/linux/openvswitch.h
> > +++ b/include/linux/openvswitch.h
> > @@ -280,6 +280,7 @@ enum ovs_key_attr {
> >         OVS_KEY_ATTR_ND,        /* struct ovs_key_nd */
> >         OVS_KEY_ATTR_SKB_MARK,  /* u32 skb mark */
> >         OVS_KEY_ATTR_TUNNEL,    /* Nested set of ovs_tunnel attributes */
> > +       OVS_KEY_ATTR_RECIRCULATION_ID,  /* u32 recirculation id */
> >
> >  #ifdef __KERNEL__
> >         OVS_KEY_ATTR_IPV4_TUNNEL,  /* struct ovs_key_ipv4_tunnel */
> > @@ -513,10 +514,13 @@ struct ovs_action_push_vlan {
> >   * indicate the new packet contents This could potentially still be
> >   * %ETH_P_MPLS_* if the resulting MPLS label stack is not empty.  If there
> >   * is no MPLS label stack, as determined by ethertype, no action is taken.
> > + * @OVS_ACTION_ATTR_RECIRCULATE: Restart processing of packet.
> > + * The packet must have been modified by a previous action in such a way
> > + * that it does not match its original facet again.
> 
> I think we should avoid use of the word 'facet' in the interface or in
> kernel code since it is a specific userspace concept which could
> change over time.

Ok, I'll use the word flow instead.

> > diff --git a/lib/match.c b/lib/match.c
> > index 2aa4e89..63626a3 100644
> > --- a/lib/match.c
> > +++ b/lib/match.c
> > @@ -849,7 +849,7 @@ match_format(const struct match *match, struct ds *s, 
> > unsigned int priority)
> >
> >      int i;
> >
> > -    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 20);
> > +    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 21);
> >
> >      if (priority != OFP_DEFAULT_PRIORITY) {
> >          ds_put_format(s, "priority=%u,", priority);
> 
> I think we need to be able to print out any new fields that we add to
> struct flow for debugging purposes, even if they aren't exposed to the
> user normally.

Sure, I'll add recirculation here.

> > diff --git a/lib/odp-util.c b/lib/odp-util.c
> > index 0ded70d..7fd7a3f 100644
> > --- a/lib/odp-util.c
> > +++ b/lib/odp-util.c
> > @@ -1546,6 +1558,11 @@ odp_flow_key_from_flow(struct ofpbuf *buf, const 
> > struct flow *flow,
> >          nl_msg_put_u32(buf, OVS_KEY_ATTR_SKB_MARK, flow->skb_mark);
> >      }
> >
> > +    if (flow->recirculation_id) {
> > +        nl_msg_put_u32(buf, OVS_KEY_ATTR_RECIRCULATION_ID,
> > +                       flow->recirculation_id);
> > +    }
> 
> I think we should be careful about making zero a magic value.  It's
> fine to do that in userspace as part of our implementation but I
> wouldn't want to carry that assumption across interface boundaries.

Sure, I'll clean that up somehow.

> >  commit_set_nw_action(const struct flow *flow, struct flow *base,
> >                       struct ofpbuf *odp_actions)
> >  {
> > +    //VLOG_WARN("%s: %u 0x%04x", __func__, flow->nw_proto, 
> > ntohs(flow->dl_type));
> 
> Can we remove this debugging code?

Yes.

> > @@ -2419,7 +2450,8 @@ commit_set_skb_mark_action(const struct flow *flow, 
> > struct flow *base,
> >   * 'base' and 'flow', appends ODP actions to 'odp_actions' that change the 
> > flow
> >   * key from 'base' into 'flow', and then changes 'base' the same way.  
> > Does not
> >   * commit set_tunnel actions.  Users should call commit_odp_tunnel_action()
> > - * in addition to this function if needed. */
> > + * and commit_recirculate_action() in addition to those functions are 
> > needed.
> 
> I think the actual name of the function is
> commit_odp_recirculate_action (otherwise it makes it difficult to
> search for it).

Thanks, I'll fix that.

> > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> > index 9fe0aaa..0329e3c 100644
> > --- a/ofproto/ofproto-dpif.c
> > +++ b/ofproto/ofproto-dpif.c
> 
> I stopped here since the code is complex enough that I need to be able
> to see it in context.
> 
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to