On Fri, Aug 22, 2014 at 08:30:08AM -0700, Ben Pfaff wrote: > On Fri, Aug 22, 2014 at 09:19:41PM +0900, Simon Horman wrote: > > I have been working with Netronome on examining the possibilities of > > providing (richer) load balancing facilities in Open vSwitch. > > > > It seems to us that the current select group provides for some load > > balancing functionality. And that in particular the way that it is > > implemented in Open vSwitch provides L2 destination load balancing (it > > hashes on the destination ethernet address). Our ideas so fare are as > > follows: > > > > 1. Provide a richer and ideally extendible select group in the > > form of an OpenFlow extension to groups. > > > > * Allow the fields used to be selected. > > > > In the case of a hash this would be the fields that are hashed. > > > > An implication of this is that the pre-requisites of these > > fields would need to be present in the flow's match. > > In masking of the fields would be allowed but not > > required for fields whose TLVs allow masking. > > > > * Allow designation of the selection method used. > > > > For example hash. > > > > * Allow passing a parameter to the selection method. > > > > For example an initial value key for hashes. > > There is an outstanding patch on this topic already: > http://patchwork.openvswitch.org/patch/5424/
I have no particular objections to that change, though I have not thought about it deeply. However, I think its more valuable in the long run to make select groups configurable rather than tweaking what would be the default setting. > It sounds reasonable to make select groups configurable. The way to do > that would be to implement the OpenFlow 1.5 (draft) proposal to add > properties to groups and group buckets, which is filed in the ONF JIRA > as EXT-350. I'm happy to look at implementing EXT-350 (which I now have access to :) however it seems to me that while it makes groups more configurable it does not address the ability to configure the selection method. At the end of this email I have included a fleshed-out version of the enhanced select group proposal that we have been discussing at Netronome. > > 2. Investigate allowing selection of buckets to occur in the datapath. Or > > in other words a megaflow with a select group action. The current > > select group implementation seems to be a good candidate for this > > investigation. > > Maybe this could be implemented via recirculation without datapath > changes, in the same way that bonds are implemented. I think that would allow a megaflow to handle the actions before the select group. But it seems that the recirculation action would result in much more fine-grained post-recirculation flows. What we would like to do is to provide something generally useful which may be used as appropriate to: * Reduce flow-setup overhead by using a megaflow to handle many flows and in turn provide something that lends itself to offloads. Working on a prototype to add current hash-based select group behaviour that is present in ovs-vswtichd to the datapath we have come to realise there may be situations where the cost of selecting the bucket for each packet may outweigh the reduced flow-setup cost: In the particular case of hash this may be avoidable by using the RSS hash which I believe is pre-calculated. But regardless we do see that it may be better to use the current user-space approach in some cases. But we also think there are very likely cases where performing selection in the datapath is a win. And we think that things could be arranged such that ovs-vswtichd would only use the datapath select action when it is a win. * Allow use of existing kernel infrastructure to implement selection. I am particularly thinking in terms of using IPVS (which I maintain) to provide stateful connection-based load balancing by using its "schedulers" as a selection method. That is not to say that we am trying to propose a solution tailored to allowing the use of IPVS. But rather I think its an example of how a datapath select action could be useful. * Allow the possibility of offloading selection beyond the datapath and into hardware via hooks in the datapath. For example offloading to a Netronome flow processor (I am sure there are other examples). Again, we are not trying to propose something that is only useful to Netronome. Rather that this is an example of how a datapath select action could be useful. In relation to hooks for offloading, I plan to start a public discussion on that separately. ---------------------------------------------------------------------- Proposal: Proposal for enhanced select groups Version: 0.0.1 Contents ======== 1. Introduction 2. How it Works 3. Experimenter Id 4. Experimenter Messages 5. History 1. Introduction =============== This text describes a Netronome Extension to OpenFlow 1.4 that allows a controller to provide more information on the selection method for select groups. This proposal is in the form of an enhanced select group type. This may subsequently be proposed as an extension or update to the OpenFlow specification. 2. How it works =============== A new Netronome extension group mod message is defined which provides compatibility with the group mod message defined in Open Flow 1.4 and allows extra parameters to be passed by the controller. In particular it allows controllers to: * Specify the fields used for bucket selection by the select group. * Designate the selection method used. * Provide a non-field parameter to the selection method. 3. Experimenter ID ================== The Experimenter ID of this extension is: NMX_VENDOR_ID = 0x00001540 4. Experimenter Messages ======================== The following message subtype defined by this extension. enum nmx_group_mod_subtype { NMXT_GROUP_MOD = 1 } Modifications to the group table from the controller may be done with a NMXT_GROUP_MOD message. The behaviour of this is analogous to that of the OFPT_GROUP_MOD message described in Open Flow 1.4 section 7.3.4.3 Modify Group Entry Message. The NMXT_GROUP_MOD is intended to cover all configurations covered by OFPT_GROUP_MOD and to allow new configurations through the new selection_method, selection_method_param and fields members of nmx_group_mod. struct nmx_group_mod { struct ofp_header header; ovs_be32 vendor; /* NMX_VENDOR_ID. */ ovs_be32 subtype; /* OFPRAW_NMXT_GROUP_MOD. */ ovs_be16 command; /* One of OFPGC_*. */ uint8_t type; /* One of OFPGT_*. */ uint8_t pad; /* Pad to 64 bits. */ ovs_be32 group_id; /* Group identifier. */ char selection_method[NXM_MAX_SELECTION_METHOD_LEN]; /* Null-terminated */ ovs_be64 selection_method_param;/* Non-Field parameter for bucket selection. */ struct ofp_match fields; /* Fields used for bucket selection. Variable size. */ // struct ofp_buckets[0]; /* The length of the bucket array is inferred from the length field in header and fields */ } OVS_ASSERT(sizeof(struct nmx_group_mod) == 48); The vendor field is the Experimenter ID (see 3). The subtype field is NMXT_GROUP_MOD. The command field must be one of the OFPGC_* values defined in Open Flow 1.4 section 7.3.4.3 Modify Group Entry Message. The group type field must be one of the OFPGT_* values defined in Open Flow 1.4 section 7.3.4.3 Modify Group Entry Message. The group selection_method is a null-terminated string which if non-zero length specifies a selection method known to an underlying layer of the switch. The value of NXM_MAX_SELECTION_METHOD_LEN is 16. The group selection_method must be zero-length (i.e. the first byte must be null) if type is not OFPGT_SELECT. It may be zero-length if type is OFPGT_SELECT to request compatibility with Open Flow 1.4. The selection_method_param provides a non-field parameter for the group selection_method. It must be all-zeros unless the group selection_method is non-zero length. The selection_method_param may for example be used as an initial value for the hash of a hash group selection method. The fields field is an ofp_match structure which includes the fields which should be used as inputs to bucket selection. ofp_match is described in Open Flow 1.4 section 7.2.2 Flow Match Structures. Fields must not be specified unless the group selection_method is non-zero. The pre-requisites for fields specified must be satisfied in the match for any flow that uses the group. Masking is allowed but not required for fields whose TLVs allow masking. The fields may for example be used as the fields that are hashed by a hash group selection method. The buckets field is an array of buckets the structure and schematics of which is described in Open Flow 1.4 section 7.3.4.3 Modify Group Entry Message. 5. History ========== This proposal has been developed independently of any similar work in this area. No such work is known. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev