On Wed, Jan 20, 2016 at 05:39:46PM -0800, Jarno Rajahalme wrote:
> Not a full review, but see some comments below,
> 
> And sorry for not snipping, but I felt like not cutting any context
> when commenting an RFC patch.

Sure.  Thanks for reviewing it at all; I wasn't expecting this much
feedback for an RFC.

> > On Jan 18, 2016, at 11:27 PM, Ben Pfaff <b...@ovn.org> wrote:
> > 
> > One purpose of OpenFlow packet-in messages is to allow a controller to
> > interpose on the path of a packet through the flow tables.  If, for
> > example, the controller needs to modify a packet in some way that the
> > switch doesn't directly support, the controller should be able to
> > program the switch to send it the packet, then modify the packet and
> > send it back to the switch to continue through the flow table.
> > 
> > That's the theory.  In practice, this doesn't work with any but the
> > simplest flow tables.  Packet-in messages simply don't include enough
> > context to allow the flow table traversal to continue.  For example:
> > 
> >    * Via "resubmit" actions, an Open vSwitch packet can have an
> >      effective "call stack", but a packet-in can't describe it, and
> >      so it would be lost.
> > 
> >    * Via "patch ports", an Open vSwitch packet can traverse multiple
> >      OpenFlow logical switches.  A packet-in can't describe or resume
> >      this context.
> > 
> >    * A packet-in can't preserve the stack used by NXAST_PUSH and
> >      NXAST_POP actions.
> > 
> >    * A packet-in can't preserve the OpenFlow 1.1+ action set.
> > 
> >    * A packet-in can't preserve the state of Open vSwitch mirroring
> >      or connection tracking.
> > 
> > This commit introduces a solution called "closures".  A closure is the
> > state of a packet's traversal through OpenFlow flow tables.  A new
> > NXAST_PAUSE action generates a closure and sends it to the OpenFlow
> > controller as an NXT_CLOSURE asynchronous message (which must be
> > enabled through an extension to OFPT_SET_ASYNC or NXT_SET_ASYNC2).
> > The controller processes the closure, possibly modifying some of its
> > publicly accessible data (for now, the packet and its metadata), and
> > sends it back to the switch with an NXT_RESUME request, which
> > causes flow table traversal to continue.  In principle, a single
> > packet can be paused and resumed multiple times.
> > 
> 
> Am I roughly right in characterizing this as follows:
> 
> - NXAST_PAUSE is an extension of the existing CONTROLLER actions, its purpose 
> is to send the packet to the controller, with full pipeline context (some of 
> which is switch implementation dependent, and may thus vary from switch to 
> switch).
> - NXT_CLOSURE is an extension of PACKET_IN, allowing for implementation 
> dependent metadata.
> - NXT_RESUME is an extension of PACKET_OUT, with the semantics that the 
> pipeline processing is continued with the original translation context from 
> where it was left at the time of NXAST_PAUSE.

That's a good overview.  I'm going to add that to the commit message
(unless you object).  Thank you.

> > +/* NXT_CLOSURE.
> > + *
> > + * A closure serializes the state of a packet's trip through Open vSwitch 
> > flow
> > + * tables.  This permits an OpenFlow controller to interpose on a packet 
> > midway
> > + * through processing in Open vSwitch.
> > + *
> > + * Closures fit into packet processing this way:
> > + *
> > + * 1. A packet ingresses into Open vSwitch, which runs it through the 
> > OpenFlow
> > + *    tables.
> > + *
> > + * 2. An OpenFlow flow executes a NXAST_PAUSE action.  Open vSwitch 
> > serializes
> > + *    the packet processing state into a closure and sends it (as an
> > + *    NXT_CLOSURE) to the OpenFlow controller.
> > + *
> > + * 3. The controller receives the NXT_CLOSURE and processes it.  The 
> > controller
> > + *    can interpret and, if desired, modify some of the contents of the
> > + *    closure, such as the packet and the metadata being processed.
> > + *
> 
> Is the controller allowed to change the configuration of the pipeline
> before sending the closure back to the switch?

Yes.

> What if the controller just issued an OpenFlow message changing the
> pipeline configuration, and the closure was sent to the controller at
> the same time? I.e., after the OpenFlow message takes effect on the
> switch, the closure may not be resumable any more?

It would still be resumable.  I added the following paragraph after
point 4:

 * The controller might change the pipeline configuration concurrently with
 * steps 2 through 4.  For example, it might add or remove OpenFlow flows.  If
 * that happens, then the packet will experience a mix of processing from the
 * two configurations, that is, the initial processing (before NXAST_PAUSE)
 * uses the initial flow table, and the later processing (after NXT_RESUME)
 * uses the later flow table.

Other possibilities exist, I guess, but that's the "natural" approach.
Any thoughts on that?

> > + * Closures are serialized as type-level-value properties in the format
> > + * commonly used in OpenFlow 1.4 and later.  Some properties, with 'type'
> > + * values less than 0x8000, are meant to be interpreted by the controller, 
> > and
> > + * are documented here.  Other properties, with 'type' values of 0x8000 or
> > + * greater, are private to the switch, may change unpredictably from one
> > + * version of Open vSwitch to another, and are not documented here.
> > + */
> 
> Should a closure be tied to a specific OpenFlow connection, so that it
> would be meaningless to send a closure received from one switch to
> another? Or received from a switch back to it after the switch has
> been upgraded?

As implemented, closures are tied to a specific ovs-vswitchd bridge in a
specific ovs-vswitchd process.  If you restart ovs-vswitch or you delete
and add back a bridge, then NXT_RESUME will be rejected because the UUID
property for the bridge (NXCPT_BRIDGE) is ephemeral.  This is
intentional.

I thought about tying the closure to the OpenFlow channel even more
tightly, but it seemed like there were diminishing returns.  Also, I it
seems possible that a controller that maintains multiple connections
might use a different one for receiving NXT_CLOSURE and sending back
NXT_RESUME.

I added a paragraph to the explanation:

 * An NXT_CLOSURE is tied to a given Open vSwitch process and bridge, so that
 * restarting Open vSwitch or deleting and recreating a bridge will cause the
 * corresponding NXT_RESUME to be rejected.

> > +enum nx_closure_prop_type {
> > +    /* Public properties. */
> > +    NXCPT_PACKET,               /* Raw packet data. */
> > +    NXCPT_METADATA,             /* NXM or OXM for metadata fields. */
> > +
> > +    /* Private properties.  These are not architectural and subject to 
> > change.
> 
> *are* subject to change.

This is a problem with English operator precedence.

I reworded it to "These are subject to change and not architectural.".

> > +static void
> > +format_PAUSE(const struct ofpact_pause *pause, struct ds *s)
> > +{
> > +    ds_put_cstr(s, "pause");
> > +    if (pause->controller_id) {
> > +        ds_put_format(s, "(id=%"PRIu16")", pause->controller_id);
> 
> It would be nice if there was some hint in the format that the “id” is
> a controller ID.

Fair enough, I changed "id" to "controller_id" here.

> > +    }
> > +}

...

> > +        case OFPACT_PAUSE:
> > +            ctx->pause = ofpact_get_PAUSE(a);
> > +            ctx->xout->slow |= SLOW_CONTROLLER;
> > +            ctx_trigger_recirculation(ctx);
> 
> It would be nice to abstract this somehow, as we are not recirculating
> here, but using the recalculation infra for unrolling the call stack
> and collecting other pipeline state. Maybe
> ‘ctx_trigger_pipeline_closure(reason)’ (it could be argued that
> recalculation is using the concept of closure even if the ‘closure’
> need not be serialized externally for it?

I see three paths:

        * Consider "closure" as a special case of "recirculation".  This
          is what I was doing by default, thinking of a "closure" as a
          serialization of a recirculation state.

        * Consider "recirculation" as a special cause of "closure" where
          we don't serialize the state.

        * Invent some new term like "freeze" or "stasis" or whatever and
          consider both recirculation and closure as what we do with the
          state afterward.

I'm all ears.

> >     if (xin->recirc) {
> > -        const struct recirc_state *state = &xin->recirc->state;
> > +        const struct recirc_state *state = xin->recirc;
> 
> This is a bug fix for the previous patch.

Thanks, I folded this into the previous patch before applying it.

> > +    struct dpif_execute execute = {
> > +        .actions = odp_actions.data,
> > +        .actions_len = odp_actions.size,
> > +        .needs_help = (slow & SLOW_ACTION) != 0,
> > +        .packet = &packet,
> > +    };
> > +    enum ofperr error = dpif_execute(ofproto->backer->dpif, &execute);
> > +
> > +    /* XXX what about stats? */
> 
> Rule stats are attributed as they are translated. All actions in the
> closure originate from rules that were already translated (== stats
> are done for them already). If the actions do further lookups
> (resubmits, goto_table), any new matching rules will get their stats
> attributed during the ‘xlate_closure()’ call above?

I think you're right.  At the point I wrote this code I was aching to
just get something working and didn't want to think anymore.

> > +    ofpbuf_uninit(&odp_actions);
> > +    dp_packet_uninit(&packet);
> > +    return error;
> > +}

Thanks for all the comments.  I'll finish this up and resubmit.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to