On Wed, Jan 20, 2016 at 05:39:46PM -0800, Jarno Rajahalme wrote: > Not a full review, but see some comments below, > > And sorry for not snipping, but I felt like not cutting any context > when commenting an RFC patch.
Sure. Thanks for reviewing it at all; I wasn't expecting this much feedback for an RFC. > > On Jan 18, 2016, at 11:27 PM, Ben Pfaff <b...@ovn.org> wrote: > > > > One purpose of OpenFlow packet-in messages is to allow a controller to > > interpose on the path of a packet through the flow tables. If, for > > example, the controller needs to modify a packet in some way that the > > switch doesn't directly support, the controller should be able to > > program the switch to send it the packet, then modify the packet and > > send it back to the switch to continue through the flow table. > > > > That's the theory. In practice, this doesn't work with any but the > > simplest flow tables. Packet-in messages simply don't include enough > > context to allow the flow table traversal to continue. For example: > > > > * Via "resubmit" actions, an Open vSwitch packet can have an > > effective "call stack", but a packet-in can't describe it, and > > so it would be lost. > > > > * Via "patch ports", an Open vSwitch packet can traverse multiple > > OpenFlow logical switches. A packet-in can't describe or resume > > this context. > > > > * A packet-in can't preserve the stack used by NXAST_PUSH and > > NXAST_POP actions. > > > > * A packet-in can't preserve the OpenFlow 1.1+ action set. > > > > * A packet-in can't preserve the state of Open vSwitch mirroring > > or connection tracking. > > > > This commit introduces a solution called "closures". A closure is the > > state of a packet's traversal through OpenFlow flow tables. A new > > NXAST_PAUSE action generates a closure and sends it to the OpenFlow > > controller as an NXT_CLOSURE asynchronous message (which must be > > enabled through an extension to OFPT_SET_ASYNC or NXT_SET_ASYNC2). > > The controller processes the closure, possibly modifying some of its > > publicly accessible data (for now, the packet and its metadata), and > > sends it back to the switch with an NXT_RESUME request, which > > causes flow table traversal to continue. In principle, a single > > packet can be paused and resumed multiple times. > > > > Am I roughly right in characterizing this as follows: > > - NXAST_PAUSE is an extension of the existing CONTROLLER actions, its purpose > is to send the packet to the controller, with full pipeline context (some of > which is switch implementation dependent, and may thus vary from switch to > switch). > - NXT_CLOSURE is an extension of PACKET_IN, allowing for implementation > dependent metadata. > - NXT_RESUME is an extension of PACKET_OUT, with the semantics that the > pipeline processing is continued with the original translation context from > where it was left at the time of NXAST_PAUSE. That's a good overview. I'm going to add that to the commit message (unless you object). Thank you. > > +/* NXT_CLOSURE. > > + * > > + * A closure serializes the state of a packet's trip through Open vSwitch > > flow > > + * tables. This permits an OpenFlow controller to interpose on a packet > > midway > > + * through processing in Open vSwitch. > > + * > > + * Closures fit into packet processing this way: > > + * > > + * 1. A packet ingresses into Open vSwitch, which runs it through the > > OpenFlow > > + * tables. > > + * > > + * 2. An OpenFlow flow executes a NXAST_PAUSE action. Open vSwitch > > serializes > > + * the packet processing state into a closure and sends it (as an > > + * NXT_CLOSURE) to the OpenFlow controller. > > + * > > + * 3. The controller receives the NXT_CLOSURE and processes it. The > > controller > > + * can interpret and, if desired, modify some of the contents of the > > + * closure, such as the packet and the metadata being processed. > > + * > > Is the controller allowed to change the configuration of the pipeline > before sending the closure back to the switch? Yes. > What if the controller just issued an OpenFlow message changing the > pipeline configuration, and the closure was sent to the controller at > the same time? I.e., after the OpenFlow message takes effect on the > switch, the closure may not be resumable any more? It would still be resumable. I added the following paragraph after point 4: * The controller might change the pipeline configuration concurrently with * steps 2 through 4. For example, it might add or remove OpenFlow flows. If * that happens, then the packet will experience a mix of processing from the * two configurations, that is, the initial processing (before NXAST_PAUSE) * uses the initial flow table, and the later processing (after NXT_RESUME) * uses the later flow table. Other possibilities exist, I guess, but that's the "natural" approach. Any thoughts on that? > > + * Closures are serialized as type-level-value properties in the format > > + * commonly used in OpenFlow 1.4 and later. Some properties, with 'type' > > + * values less than 0x8000, are meant to be interpreted by the controller, > > and > > + * are documented here. Other properties, with 'type' values of 0x8000 or > > + * greater, are private to the switch, may change unpredictably from one > > + * version of Open vSwitch to another, and are not documented here. > > + */ > > Should a closure be tied to a specific OpenFlow connection, so that it > would be meaningless to send a closure received from one switch to > another? Or received from a switch back to it after the switch has > been upgraded? As implemented, closures are tied to a specific ovs-vswitchd bridge in a specific ovs-vswitchd process. If you restart ovs-vswitch or you delete and add back a bridge, then NXT_RESUME will be rejected because the UUID property for the bridge (NXCPT_BRIDGE) is ephemeral. This is intentional. I thought about tying the closure to the OpenFlow channel even more tightly, but it seemed like there were diminishing returns. Also, I it seems possible that a controller that maintains multiple connections might use a different one for receiving NXT_CLOSURE and sending back NXT_RESUME. I added a paragraph to the explanation: * An NXT_CLOSURE is tied to a given Open vSwitch process and bridge, so that * restarting Open vSwitch or deleting and recreating a bridge will cause the * corresponding NXT_RESUME to be rejected. > > +enum nx_closure_prop_type { > > + /* Public properties. */ > > + NXCPT_PACKET, /* Raw packet data. */ > > + NXCPT_METADATA, /* NXM or OXM for metadata fields. */ > > + > > + /* Private properties. These are not architectural and subject to > > change. > > *are* subject to change. This is a problem with English operator precedence. I reworded it to "These are subject to change and not architectural.". > > +static void > > +format_PAUSE(const struct ofpact_pause *pause, struct ds *s) > > +{ > > + ds_put_cstr(s, "pause"); > > + if (pause->controller_id) { > > + ds_put_format(s, "(id=%"PRIu16")", pause->controller_id); > > It would be nice if there was some hint in the format that the “id” is > a controller ID. Fair enough, I changed "id" to "controller_id" here. > > + } > > +} ... > > + case OFPACT_PAUSE: > > + ctx->pause = ofpact_get_PAUSE(a); > > + ctx->xout->slow |= SLOW_CONTROLLER; > > + ctx_trigger_recirculation(ctx); > > It would be nice to abstract this somehow, as we are not recirculating > here, but using the recalculation infra for unrolling the call stack > and collecting other pipeline state. Maybe > ‘ctx_trigger_pipeline_closure(reason)’ (it could be argued that > recalculation is using the concept of closure even if the ‘closure’ > need not be serialized externally for it? I see three paths: * Consider "closure" as a special case of "recirculation". This is what I was doing by default, thinking of a "closure" as a serialization of a recirculation state. * Consider "recirculation" as a special cause of "closure" where we don't serialize the state. * Invent some new term like "freeze" or "stasis" or whatever and consider both recirculation and closure as what we do with the state afterward. I'm all ears. > > if (xin->recirc) { > > - const struct recirc_state *state = &xin->recirc->state; > > + const struct recirc_state *state = xin->recirc; > > This is a bug fix for the previous patch. Thanks, I folded this into the previous patch before applying it. > > + struct dpif_execute execute = { > > + .actions = odp_actions.data, > > + .actions_len = odp_actions.size, > > + .needs_help = (slow & SLOW_ACTION) != 0, > > + .packet = &packet, > > + }; > > + enum ofperr error = dpif_execute(ofproto->backer->dpif, &execute); > > + > > + /* XXX what about stats? */ > > Rule stats are attributed as they are translated. All actions in the > closure originate from rules that were already translated (== stats > are done for them already). If the actions do further lookups > (resubmits, goto_table), any new matching rules will get their stats > attributed during the ‘xlate_closure()’ call above? I think you're right. At the point I wrote this code I was aching to just get something working and didn't want to think anymore. > > + ofpbuf_uninit(&odp_actions); > > + dp_packet_uninit(&packet); > > + return error; > > +} Thanks for all the comments. I'll finish this up and resubmit. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev