On 17 September 2015 at 17:25, Ben Pfaff <b...@nicira.com> wrote: > On Thu, Sep 17, 2015 at 04:04:23PM -0700, Joe Stringer wrote: >> This patch adds a new action and fields to OVS that allow connection >> tracking to be performed. This support works in conjunction with the >> Linux kernel support merged into the Linux-4.3 development cycle. >> >> The new 'ct' action sends this flow through the connection tracker, >> which accepts the following parameters initally: >> >> - "commit": Connections that execute ct(commit) should be remembered, >> allowing future packets in the connection to be recognized as part of >> an "established" (est) connection, packets in the reply (rpl) >> direction, or related to an existing connection (rel). >> - "zone=[u16|NXM]": Perform connection tracking in the zone specified. >> Each zone is an independent connection tracking context. If the >> "commit" parameter is used, the connection will only be remembered in >> the specified zone, and not in other zones. This is 0 by default. >> - "table=NUMBER": This makes the conntrack action act like the "output" >> action, with a copy of the packet sent to the connection tracker. >> When the connection tracker has finished processing, it may return >> and resume pipeline processing in the specified table, with connection >> tracking fields populated (initially ct_state and ct_zone). This table >> is strongly recommended to be greater than the current table, to >> prevent loops. > > Wow, this is big! Thanks for doing all this work! > > The description of "table" above confuses me a little. The important > description, though, is in ovs-ofctl(8), so I'll review that description > in more detail.
I've found this somewhat difficult to concisely describe, and I think this is a reflection on that. Probably the most precise version I have so far is in the comment about nx_action_conntrack: * The "recirc_table" allows NXM_NX_CT_* fields to become available: * * If "recirc_table" has a value other than NX_CT_RECIRC_NONE, then the * packet will be logically cloned prior to executing this action. One * copy will be sent to the connection tracker, then will be re-injected * into the OpenFlow pipeline beginning at the OpenFlow table specified in * this field. When the packet re-enters the pipeline, the NXM_NX_CT_* * fields will be populated. The original instance of the packet will * continue the current actions list. This can be thought of as similar to * the effect of the "output" action: One copy is sent out (in this case, * to the connection tracker), but the current copy continues processing. * * It is strongly recommended that this table is later than the current * table, to prevent loops. Perhaps the reference to output action adds more confusion than clarity. I could drop that part. Alternatively, the ovs-ofctl manual currently says this: Fork pipeline processing in two. The original instance of the packet will continue processing the current actions list. An additional instance of the packet will be sent to the connection tracker, which will be re-injected into the OpenFlow pipeline to resume processing in table number, with the ct_state and other ct match fields set. If the table is not specified, then the packet is submitted to the connection tracker, but the pipeline does not fork and the ct match fields are not populated. It is strongly recommended to specify a table later than the current table to prevent loops. I guess ideally all three of these places say essentially the same thing (with the caveat that ofp-actions.c should have the wire-format clarifications, while the others should be sufficient to have ofctl format). >> ovs-ofctl add-flow br0 \ >> "table=1,in_port=2,conn_state=+trk+new,tcp,action=drop" > > I think that the above example needs s/conn_state/ct_state/. OK. > Here in NEWS, let's make a mental note to revise this when DPDK-based > connection tracking gets in: >> + - Add support for connection tracking through the new "ct" action >> + and "ct_state"/"ct_zone" match fields. Only available on Linux kernels >> + with the connection tracking module loaded. > > In openvswitch.h, are the OVS_CS_F_* flags the same as other flags > already exported by Linux somewhere and, if so, would the Linux kernel > people like it better if we #defined them directly to those values (or > used the existing names directly)? Obviously we'd have to work around > that for non-Linux machines. They're not the same as those exported by conntrack. They are the same as those exported by the openvswitch.h header. In this version I added build-time assertions that the values match the odp-netlink header. > The design of OVS_CT_ATTR_FLAGS as having a single flag bit is one > approach, but another that is sometimes taken for Netlink is to have the > presence of an attribute act as a flag. For example, one could have an > OVS_CT_ATTR_COMMIT attribute whose presence signals that the flow should > be committed. I think that discussions around both the previous versions and potential future extensions of this interface need additional flags. As this is already merged upstream, I'm inclined to push forward with this at the moment. > It's worrisome that there's no way to slow-path a flow that includes a > conntrack action. I guess the consequences aren't too bad though since > the common case of needing userspace help just redirects the packet back > to the kernel. This is an unusual case, merging a feature with kernel support prior to userspace. But I suspect this should be alleviated soon if Daniele's work is as far along as I think it is. We do still need to be cautious though, that we cannot mix&match execution of the ct action between kernel and userspace. I'd like to do some more comprehensive testing around this, but I don't think we should block on this. > The third statement here seems to follow from the first two. Did you > mean something stronger like 'If the "Invalid" bit is set then only the > "Tracked" bit is also set.'? > + * - If "Tracked" is unset, no other bits may be set. > + * - If "Tracked" is set, one or more other bits may be set. > + * - The "Invalid" bit is only ever set with the "Tracked" bit. Yes, that's what I meant. Updated. > I don't think the checks for nonzero masks are needed in nx_put_raw(), > since nxm_put_16m() will eventually check for itself. OK. > For ovs_conntrack_policy[] in odp-util.c, there's no need to give a > min_len in either case since the defaults are correct. It was probably a style thing that leaked over from the later attributes which really do need the min_len. I've removed it. > In parse_conntrack_action(), I think that the n < 0 test here is always > true: > + while (s != end) { > + int n = -1; > + > + s += strspn(s, delimiters); > + if (ovs_scan(s, "commit%n", &n)) { > + flags |= OVS_CT_F_COMMIT; > + s += n; > + continue; > + } > + if (ovs_scan(s, "zone=%"SCNu16"%n", &zone, &n)) { > + s += n; > + continue; > + } > + > + if (n < 0) { > + return -EINVAL; > + } > + } Agreed, removed. > In format_odp_key_attr() under OVS_KEY_ATTR_CT_STATE, I think that > format_flags_masked() covers the whole case. (Are you sure you want ',' > for the delimiter to format_flags()? That makes it look like multiple > fields, probably should use | as format_flags_masked() passes to > format_flags(). Also, parse_flags() looks for |.) OK. (Mental note: presumably TCP flags could also just use format_flags_masked()) > In format_odp_key_attr() under OVS_KEY_ATTR_CT_ZONE, please output an 0x > prefix for hexadecimal output, to make it obvious that it's hex. OK. > In ofp-actions.c, I don't think anything enforces prerequisites for > 'zone_src'. Most likely the later patches don't enforce prerequisites for nested actions either. I'll fix it. > In format_CT() in ofp-actions.c, you can simplify a little by always > appending a comma and then snipping the last one, e.g.: > > static void > format_alg(int port, struct ds *s) > { > if (port == IPPORT_FTP) { > ds_put_format(s, "alg=ftp,"); > } else if (port) { > ds_put_format(s, "alg=%d,", port); > } > } > > static void > format_CT(const struct ofpact_conntrack *a, struct ds *s) > { > ds_put_cstr(s, "ct("); > if (a->flags & NX_CT_F_COMMIT) { > ds_put_cstr(s, "commit,"); > } > if (a->recirc_table != NX_CT_RECIRC_NONE) { > ds_put_format(s, "table=%"PRIu8",", a->recirc_table); > } > format_alg(a->alg, s); > if (a->zone_src.field) { > ds_put_format(s, "zone="); > mf_format_subfield(&a->zone_src, s); > ds_put_char(s, ','); > } else if (a->zone_imm) { > ds_put_format(s, "zone=%"PRIu16, a->zone_imm); > ds_put_char(s, ','); > } > format_alg(a->alg, s); > if (ofpact_ct_get_action_len(a)) { > ds_put_cstr(s, "exec("); > ofpacts_format(a->actions, ofpact_ct_get_action_len(a), s); > ds_put_cstr(s, "),"); > } > ds_chomp(s, ','); > ds_put_char(s, ')'); > } Chomp chomp! OK. > I think that struct ofpact_conntrack doesn't need to be defined with a > pad member here. I guess it's done that way because it becomes > important in the next patch, but it's confusing here. I think I managed to catch this between the version you started reviewing to this one I posted. > In ofproto-dpif-xlate.c, why does compose_conntrack_action() throw away > half of the zone (via "& 0xff")? Because I don't have a test in the testsuite that tries large zone IDs :-) Fixed, test amended to check it. > Acked-by: Ben Pfaff <b...@nicira.com> Thanks, Joe _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev