On Thu, Jan 28, 2016 at 02:15:29PM -0800, Jarno Rajahalme wrote: > IMO you have included sufficient testing to warrant the code to be > merged, maybe remove the “RFC” reference from the commit message?
Yes, that was a mistake, I forgot to remove it from the commit message before posting. > Some comments for future consideration below. > > With this incremental: > > diff --git a/lib/ofp-util.c b/lib/ofp-util.c > index 42c8f1e..4a080cd 100644 > --- a/lib/ofp-util.c > +++ b/lib/ofp-util.c > @@ -3887,7 +3887,7 @@ parse_actions_property(struct ofpbuf *property, enum > ofp_version version, > { > if (!ofpbuf_try_pull(property, ROUND_UP(ofpbuf_headersize(property), > 8))) { > VLOG_WARN_RL(&bad_ofmsg_rl, > - "actions property has bad length %"PRIuSIZE, > + "actions property has bad length %"PRIu32, > property->size); > return OFPERR_OFPBPC_BAD_LEN; > } > > > Acked-by: Jarno Rajahalme <ja...@ovn.org> Thanks, I folded that fix in. > > On Jan 27, 2016, at 9:51 AM, Ben Pfaff <b...@ovn.org> wrote: > > > > One purpose of OpenFlow packet-in messages is to allow a controller to > > interpose on the path of a packet through the flow tables. If, for > > example, the controller needs to modify a packet in some way that the > > switch doesn't directly support, the controller should be able to > > program the switch to send it the packet, then modify the packet and > > send it back to the switch to continue through the flow table. > > > > That's the theory. In practice, this doesn't work with any but the > > simplest flow tables. Packet-in messages simply don't include enough > > context to allow the flow table traversal to continue. For example: > > > > * Via "resubmit" actions, an Open vSwitch packet can have an > > effective "call stack", but a packet-in can't describe it, and > > so it would be lost. > > > > * Via "patch ports", an Open vSwitch packet can traverse multiple > > OpenFlow logical switches. A packet-in can't describe or resume > > this context. > > > > * A packet-in can't preserve the stack used by NXAST_PUSH and > > NXAST_POP actions. > > > > * A packet-in can't preserve the OpenFlow 1.1+ action set. > > > > * A packet-in can't preserve the state of Open vSwitch mirroring > > or connection tracking. > > > > This commit introduces a solution called "closures". A closure is the > > state of a packet's traversal through OpenFlow flow tables. A new > > NXAST_PAUSE action generates a closure and sends it to the OpenFlow > > controller as an NXT_CLOSURE asynchronous message (which must be > > enabled through an extension to OFPT_SET_ASYNC or NXT_SET_ASYNC2). > > Should the latter be NXT_SET_ASYNC_CONFIG? It was supposed to be NXT_SET_ASYNC_CONFIG2, which is like the OF1.4 OFPT_SET_ASYNC but backported to other OpenFlow versions so that it can support closures. > > - OpenFlow: > > * OpenFlow 1.1+ OFPT_QUEUE_GET_CONFIG_REQUEST now supports OFPP_ANY. > > * OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported. > > + * New extension message NXT_SET_ASYNC_CONFIG2 to allow OpenFlow > > 1.4-like > > Should this be just NXT_SET_ASYNC_CONFIG? I got it right here. > > + /* Private properties. These are subject to change and not > > architectural. > > + * Do not depend on them. */ > > + NXCPT_BRIDGE = 0x8000, > > + NXCPT_STACK, > > + NXCPT_MIRRORS, > > + NXCPT_CONNTRACKED, > > + NXCPT_TABLE_ID, > > + NXCPT_COOKIE, > > + NXCPT_ACTIONS, > > + NXCPT_ACTION_SET, > > When I wrote the following I did not yet see that there may be > multiple table_id, cookie, and actions properties. I guess the comment > below applies to the table_id if the table where the pause action was > located, and the cookie to the cookie of the rule that has the pause > action: > > It seems to me that the controller parsing of the incoming closures > might be eased by making table_id and cookie public properties. I do > not have any specific example in mind, but controllers explicitly > install rules with specific cookies to specific tables, hence the > controller could use, e.g., a special cookie value (or bits) to > separate different PAUSE reasons from each other. With that thought, > maybe there could be a ‘reason’ parameter to the PAUSE action that > would then be passed on in the CLOSURE message? I think we could add a "reason" parameter if it becomes useful. So far I haven't come across it yet. > > +static void > > +ofputil_put_closure_private(const struct ofputil_closure_private *closure, > > + enum ofputil_protocol protocol, struct ofpbuf > > *msg) > > +{ > > + enum ofp_version version = ofputil_protocol_to_ofp_version(protocol); > > + > > + ofputil_put_closure(&closure->public, protocol, msg); > > + > > + ofpprop_put_uuid(msg, NXCPT_BRIDGE, &closure->bridge); > > + > > + for (size_t i = 0; i < closure->n_stack; i++) { > > + const union mf_subvalue *s = &closure->stack[i]; > > + size_t ofs; > > + for (ofs = 0; ofs < sizeof *s; ofs++) { > > + if (s->u8[ofs]) { > > + break; > > + } > > + } > > + > > + ofpprop_put(msg, NXCPT_STACK, &s->u8[ofs], sizeof *s - ofs); > > + } > > + > > + if (closure->mirrors) { > > + ofpprop_put_u32(msg, NXCPT_MIRRORS, closure->mirrors); > > + } > > + > > + if (closure->conntracked) { > > + ofpprop_put_flag(msg, NXCPT_CONNTRACKED); > > + } > > + > > + if (closure->actions_len) { > > + const struct ofpact *const end = ofpact_end(closure->actions, > > + closure->actions_len); > > + const struct ofpact_unroll_xlate *unroll = NULL; > > + uint8_t table_id = 0; > > + ovs_be64 cookie = 0; > > + > > + const struct ofpact *a; > > + for (a = closure->actions; ; a = ofpact_next(a)) { > > + if (a == end || a->type == OFPACT_UNROLL_XLATE) { > > + if (unroll) { > > + if (table_id != unroll->rule_table_id) { > > + ofpprop_put_u8(msg, NXCPT_TABLE_ID, > > + unroll->rule_table_id); > > + table_id = unroll->rule_table_id; > > + } > > + if (cookie != unroll->rule_cookie) { > > + ofpprop_put_be64(msg, NXCPT_COOKIE, > > + unroll->rule_cookie); > > + cookie = unroll->rule_cookie; > > + } > > + } > > + > > + const struct ofpact *start > > + = unroll ? ofpact_next(&unroll->ofpact) : > > closure->actions; > > + put_actions_property(msg, NXCPT_ACTIONS, version, > > + start, (a - start) * sizeof *a); > > + > > + if (a == end) { > > + break; > > + } > > + unroll = ofpact_get_UNROLL_XLATE(a); > > + } > > + } > > So the above emits sets of actions as separated by the unroll_xlate > actions, and instead of encoding the separating unroll_xlate action, a > pair of table_id and cookie properties is encoded (but only if they > differ from the last ones). Is this to keep unroll_xlate private to > OVS, or are there some other reasons as well? Maybe the loop could be > preceded by a comment explaining all this? I didn't really want to make unroll_xlate public, even if it was restricted just to closures. I added a comment: /* Divide 'closure->actions' into groups that begins with an * unroll_xlate action. For each group, emit a NXCPT_TABLE_ID and * NXCPT_COOKIE property (if either has changed; each is initially * assumed 0), then a NXCPT_ACTIONS property with the grouped * actions. * * The alternative is to make OFPACT_UNROLL_XLATE public. We can * always do that later, since this is a private property. */ > > +static enum ofperr > > +parse_actions_property(struct ofpbuf *property, enum ofp_version version, > > + struct ofpbuf *ofpacts) > > +{ > > + if (!ofpbuf_try_pull(property, ROUND_UP(ofpbuf_headersize(property), > > 8))) { > > + VLOG_WARN_RL(&bad_ofmsg_rl, > > + "actions property has bad length %"PRIuSIZE, > > + property->size); > > ‘property->size’ is an uint32_t, not size_t, so the formatting is wrong. Fixed, thanks. > > + msg = ofputil_encode_closure_private(&am->closure, protocol); > > + sched = SCHED_CLOSURE; > > + port = 0; /* XXX */ > > Maybe should just document that closures have port number 0? This is just for fairness. I changed the comment to make this clear: /* XXX This 'port' value is used only to provide fairness: if * pinsched has to drop messages, it drops those for the port with * the most messages first. Perhaps we should figure out a good * way to use this. */ port = 0; > > +static void > > compose_recirculate_action__(struct xlate_ctx *ctx, uint8_t table) > > { > > struct recirc_metadata md; > > @@ -3652,20 +3682,26 @@ compose_recirculate_action__(struct xlate_ctx *ctx, > > uint8_t table) > > .action_set_len = ctx->recirc_action_offset, > > }; > > > > - /* Allocate a unique recirc id for the given metadata state in the > > - * flow. An existing id, with a new reference to the corresponding > > - * recirculation context, will be returned if possible. > > - * The life-cycle of this recirc id is managed by associating it > > - * with the udpif key ('ukey') created for each new datapath flow. */ > > - id = recirc_alloc_id_ctx(&state); > > - if (!id) { > > - XLATE_REPORT_ERROR(ctx, "Failed to allocate recirculation id"); > > - ctx->error = XLATE_NO_RECIRCULATION_CONTEXT; > > - return; > > - } > > - recirc_refs_add(&ctx->xout->recircs, id); > > + if (ctx->pause) { > > + if (ctx->xin->packet) { > > + emit_closure(ctx, &state); > > + } > > + } else { > > + /* Allocate a unique recirc id for the given metadata state in the > > + * flow. An existing id, with a new reference to the corresponding > > + * recirculation context, will be returned if possible. > > + * The life-cycle of this recirc id is managed by associating it > > + * with the udpif key ('ukey') created for each new datapath flow. > > */ > > + id = recirc_alloc_id_ctx(&state); > > + if (!id) { > > + XLATE_REPORT_ERROR(ctx, "Failed to allocate recirculation id"); > > + ctx->error = XLATE_NO_RECIRCULATION_CONTEXT; > > + return; > > + } > > + recirc_refs_add(&ctx->xout->recircs, id); > > > > - nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_RECIRC, id); > > + nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_RECIRC, id); > > + } > > This function’s name could be changed to reflect the fact that it is > used to either emit closure or to compose the recirculation action. I changed it to finish_freezing__() in the new series. > > > > /* Undo changes done by recirculation. */ > > ctx->action_set.size = ctx->recirc_action_offset; > > @@ -4178,6 +4214,7 @@ recirc_unroll_actions(const struct ofpact *ofpacts, > > size_t ofpacts_len, > > case OFPACT_GROUP: > > case OFPACT_OUTPUT: > > case OFPACT_CONTROLLER: > > + case OFPACT_PAUSE: > > case OFPACT_DEC_MPLS_TTL: > > case OFPACT_DEC_TTL: > > /* These actions may generate asynchronous messages, which > > include > > @@ -4746,6 +4783,13 @@ do_xlate_actions(const struct ofpact *ofpacts, > > size_t ofpacts_len, > > break; > > } > > > > + case OFPACT_PAUSE: > > + ctx->pause = ofpact_get_PAUSE(a); > > + ctx->xout->slow |= SLOW_CONTROLLER; > > + ctx_trigger_recirculation(ctx); > > ctx_trigger_recirculation() could be renamed as > ctx_pause_translation(), which could then lead to either emitting the > closure or recirculation. I changed it to ctx_trigger_freeze() in the new series. > > + a = ofpact_next(a); > > + break; > > + > > case OFPACT_EXIT: > > ctx->exit = true; > > break; > > @@ -5132,6 +5176,7 @@ xlate_actions(struct xlate_in *xin, struct xlate_out > > *xout) > > > > .recirc_action_offset = -1, > > .last_unroll_offset = -1, > > + .pause = NULL, > > > > .was_mpls = false, > > .conntracked = false, > > @@ -5434,6 +5479,67 @@ exit: > > return ctx.error; > > } > > > > +enum ofperr > > +xlate_closure(struct ofproto_dpif *ofproto, > > + const struct ofputil_closure_private *closure, > > + struct ofpbuf *odp_actions, > > + enum slow_path_reason *slow) > > +{ > > + struct dp_packet packet; > > + dp_packet_use_const(&packet, closure->public.packet, > > + closure->public.packet_len); > > + > > + struct flow flow; > > + flow_extract(&packet, &flow); > > + > > + struct xlate_in xin; > > + xlate_in_init(&xin, ofproto, &flow, 0, NULL, flow.tcp_flags, &packet, > > + NULL, odp_actions); > > + > > + struct ofpact_note noop; > > + ofpact_init_NOTE(&noop); > > + noop.length = 0; > > + > > + bool any_actions = closure->actions_len > 0; > > + struct recirc_state recirc = { > > + .table_id = 0, /* Not the table where NXAST_PAUSE was > > executed. */ > > + .ofproto_uuid = closure->bridge, > > + .stack = closure->stack, > > + .n_stack = closure->n_stack, > > + .mirrors = closure->mirrors, > > + .conntracked = closure->conntracked, > > + > > + /* When there are no actions, xlate_actions() will search the flow > > + * table. We don't want it to do that (we want it to resume), so > > + * supply a no-op action if there aren't any. > > + * > > + * (We can't necessarily avoid translating actions entirely if > > there > > + * aren't any actions, because there might be some finishing-up to > > do > > + * at the end of the pipeline, and we don't check for those > > + * conditions.)) */ > > So this part in parenthesis is the explanation for the use of the > ‘noop’. Would it be correct to state that even if the closure has no > actions, it still may have an action set that is executed during the > translation? What if also the action set is empty, is the translation > still needed? Maybe not. But I think that actually skipping it would be a micro-optimization that is just begging to miss some case, either now or later. > There are two closing parentheses above: ‘))’. Fixed. Even though you acked it, I'm going to repost a new version. It's again the end of a longer series, since I found several ways to improve the surrounding code. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev