On Sun, Sep 7, 2014 at 7:18 PM, Simon Horman <simon.hor...@netronome.com> wrote: > On Fri, Sep 05, 2014 at 12:07:17PM -0700, Jesse Gross wrote: >> On Thu, Sep 4, 2014 at 12:28 AM, Simon Horman >> <simon.hor...@netronome.com> wrote: >> > On Tue, Sep 02, 2014 at 07:20:30PM -0700, Pravin Shelar wrote: >> >> On Tue, Sep 2, 2014 at 6:55 PM, Jesse Gross <je...@nicira.com> wrote: >> >> > On Mon, Sep 1, 2014 at 1:10 AM, Simon Horman >> >> > <simon.hor...@netronome.com> wrote: >> >> >> On Thu, Aug 28, 2014 at 10:12:49AM +0900, Simon Horman wrote: >> >> >>> On Wed, Aug 27, 2014 at 03:03:53PM -0500, Jesse Gross wrote: >> >> >>> > On Wed, Aug 27, 2014 at 11:51 AM, Ben Pfaff <b...@nicira.com> wrote: >> >> >>> > > On Wed, Aug 27, 2014 at 10:26:14AM +0900, Simon Horman wrote: >> >> >>> > >> On Fri, Aug 22, 2014 at 08:30:08AM -0700, Ben Pfaff wrote: >> >> >>> > >> > On Fri, Aug 22, 2014 at 09:19:41PM +0900, Simon Horman wrote: >> >> >>> > >> What we would like to do is to provide something generally useful >> >> >>> > >> which may be used as appropriate to: >> >> >>> > > >> >> >>> > > I'm going to skip past these ideas, which do sound interesting, >> >> >>> > > because >> >> >>> > > I think that they're more for Pravin and Jesse than for me. I >> >> >>> > > hope that >> >> >>> > > they will provide some reactions to them. >> >> >>> > >> >> >>> > For the hardware offloading piece in particular, I would take a look >> >> >>> > at the discussion that has been going on in the netdev mailing >> >> >>> > list. I >> >> >>> > think the general consensus (to the extent that there is one) is >> >> >>> > that >> >> >>> > the hardware offload interface should be a block outside of OVS and >> >> >>> > then OVS (mostly likely from userspace) configures it. >> >> >>> >> >> >>> Thanks, I am now digesting that conversation. >> >> >> >> >> >> A lively conversation indeed. >> >> >> >> >> >> We are left with two questions for you: >> >> >> >> >> >> 1. Would you look at a proposal (I have some rough code that even >> >> >> works) >> >> >> for a select group action in the datapath prior to the finalisation >> >> >> of the question of offloads infrastructure in the kernel? >> >> >> >> >> >> From our point of view we would ultimately like to use such an >> >> >> action to >> >> >> offload to hardware. But it seems that there might be use-cases >> >> >> (not the >> >> >> one that I have rough code for) where such an action may be useful. >> >> >> For >> >> >> example to allow parts of IPVS to be used to provide stateful load >> >> >> balancing. >> >> >> >> >> >> Put another: It doesn't seem that a select group action is >> >> >> dependent on >> >> >> an offloads tough there are cases where they could be used together. >> >> > >> >> > I agree that this is orthogonal to offloading and seems fine to do >> >> > now. It seems particularly nice if we can use IPVS in a clean way, >> >> > similar to what is currently being worked on for connection tracking. >> >> > >> >> > I guess I'm not entirely sure how you plan to offload this to hardware >> >> > so it's hard to say how it would intersect in the future. However, the >> >> > current plan is to have offloading be directed for a higher point >> >> > (i.e. userspace) and have the OVS kernel module remain a software path >> >> > so probably it doesn't really matter. >> >> > >> >> > However, I'll Pravin comment since he'll be the one reviewing the code. >> > >> > Ok, my reading of the recent offload thread, which is somewhat clouded by >> > preconceptions, is that offloads could be handled by hooks in the datapath. >> > But I understand other ideas are also under discussion. Indeed it is >> > more clear to me that other ideas are under discussion now that you have >> > pointed it out. Thanks. >> >> I'm curious about about what exactly you are trying to offload though. >> Is it the actual group selection operation? Is it the whole datapath >> and these use cases happen to contain groups? > > We would like to offload the entire flow. > And that would include the group selection operation. > The reason for this is to avoid the extra flow setup-cost > involved when selection occurs in user-space. > > Although it is conceivable that an entire datapath could be offloaded > by a Netronome flow processor I think it is more practical to allow > offloads with a smaller granularity: for instance some actions may > not yet be implemented in code that runs on the flow processor. > >> >> I agree it is good to integrate datapath with IPVS. I would like to >> >> see the design proposal. >> > >> > So far I have got as far as a prototype select group action for the >> > datapath. In its current incarnation it just implements a hash, >> > using the RSS hash. >> > >> > The attributes of a select group action are one or more nested >> > bucket attributes. And bucket attributes contain a weight attribute >> > and nested action attributes. I have it in mind to add a selection method >> > attribute to the selection group attribute, as per my proposal for Open >> > Flow[1]. >> > >> > As such the current hash usage of a hash to select buckets is not >> > particularly important as I would like to support provision of >> > implementations of multiple selection methods. >> > >> > I have not yet fleshed out an IPVS proposal. But my general idea >> > is that when the datapath executes a select group action for >> > an IPVS group that it would call the IPVS scheduler (the IPVS term >> > for its connection tracker) to determine where to forward a packet. >> > >> > On this IPVS side this would probably require adding support for zones, >> > so that the entries relating to OVS would be separate from anything >> > else it is doing. >> >> This sounds an awful lot like a cross between how bonding is >> implemented (which I think is pretty much the same as the RSS hash >> backed version that you describe above) and an IPVS version of the >> connection tracking proposal that Justin sent out recently. Both of >> these use recirculation as the "select group" action. > > Yes, now you mention it, there are large similarities. > > From my point of view the conntrack proposal I am familiar with > lacks the ability for the connection tracker to return details > of the selected end-point to user-space. But I think that could be resolved. > >> I know you said that this might lead to a large number of flows post >> selection but I'm not sure why this is inherently true (or that it can't >> be mitigated). > > The scenario I am thinking of is something like this: > > 1. Pre-recirculation flow > * match: ip/0xffff,proto=ip_dst=a.b.c.d/255.255.255.255,tp_dst=80/0xfff > * actions: ...,recirc_id=X,recirculate > > 2. Post-recirculation flow > > Supposing that stateful L4 load balancing is used as the selection method. > It seems to me that the resulting flow would need to to an exact > match on all fields of the 5-tuple. > > e.g.: > * match: > recirc_id=X,proto=ip/0xffff,ip_dst=a.b.c.d/255.255.255.255,ip_src=e.f.g.h/255.255.255.255,tp_dst=80/0xfff,tp_dst=p/0xffff > * actions: output:3 > > So I see that there would basically need to be a post-recirculation flow > for each connection. Each of which would need to be established via an > upcall. This is what I meant by a large number of flows post selection. > > It is not obvious to me how this can be mitigated other than by > having a selection algorithm that lends itself to masking > by for example basing its end-point selection a masked ip_dst. > But I believe such an approach would lead to uneven balancing. > > It would be possible to not match on proto, ip_dst and ip_src in > the post-recirculation flow, as this is redundant due to the match on > recirc_id. But I don't think that alters the number of flows that would > be created.
Hmm, I guess that what you would really want is something that would return a connection ID and converting the raw output of IPVS into an ID is presumably roughly along the lines of what you were thinking for a group selection action. One issue is it seems like there is a tension between minimizing flow setups and returning details of what was selected to userspace, particularly since the flow is how we have generally returned information. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev