On Thu, Sep 11, 2014 at 05:46:03PM -0700, Jesse Gross wrote:
> On Sun, Sep 7, 2014 at 7:18 PM, Simon Horman <simon.hor...@netronome.com> 
> wrote:
> > On Fri, Sep 05, 2014 at 12:07:17PM -0700, Jesse Gross wrote:
> >> On Thu, Sep 4, 2014 at 12:28 AM, Simon Horman
> >> <simon.hor...@netronome.com> wrote:
> >> > On Tue, Sep 02, 2014 at 07:20:30PM -0700, Pravin Shelar wrote:
> >> >> On Tue, Sep 2, 2014 at 6:55 PM, Jesse Gross <je...@nicira.com> wrote:
> >> >> > On Mon, Sep 1, 2014 at 1:10 AM, Simon Horman 
> >> >> > <simon.hor...@netronome.com> wrote:
> >> >> >> On Thu, Aug 28, 2014 at 10:12:49AM +0900, Simon Horman wrote:
> >> >> >>> On Wed, Aug 27, 2014 at 03:03:53PM -0500, Jesse Gross wrote:
> >> >> >>> > On Wed, Aug 27, 2014 at 11:51 AM, Ben Pfaff <b...@nicira.com> 
> >> >> >>> > wrote:
> >> >> >>> > > On Wed, Aug 27, 2014 at 10:26:14AM +0900, Simon Horman wrote:
> >> >> >>> > >> On Fri, Aug 22, 2014 at 08:30:08AM -0700, Ben Pfaff wrote:
> >> >> >>> > >> > On Fri, Aug 22, 2014 at 09:19:41PM +0900, Simon Horman wrote:
> >> >> >>> > >> What we would like to do is to provide something generally 
> >> >> >>> > >> useful
> >> >> >>> > >> which may be used as appropriate to:
> >> >> >>> > >
> >> >> >>> > > I'm going to skip past these ideas, which do sound interesting, 
> >> >> >>> > > because
> >> >> >>> > > I think that they're more for Pravin and Jesse than for me.  I 
> >> >> >>> > > hope that
> >> >> >>> > > they will provide some reactions to them.
> >> >> >>> >
> >> >> >>> > For the hardware offloading piece in particular, I would take a 
> >> >> >>> > look
> >> >> >>> > at the discussion that has been going on in the netdev mailing 
> >> >> >>> > list. I
> >> >> >>> > think the general consensus (to the extent that there is one) is 
> >> >> >>> > that
> >> >> >>> > the hardware offload interface should be a block outside of OVS 
> >> >> >>> > and
> >> >> >>> > then OVS (mostly likely from userspace) configures it.
> >> >> >>>
> >> >> >>> Thanks, I am now digesting that conversation.
> >> >> >>
> >> >> >> A lively conversation indeed.
> >> >> >>
> >> >> >> We are left with two questions for you:
> >> >> >>
> >> >> >> 1. Would you look at a proposal (I have some rough code that even 
> >> >> >> works)
> >> >> >>    for a select group action in the datapath prior to the 
> >> >> >> finalisation
> >> >> >>    of the question of offloads infrastructure in the kernel?
> >> >> >>
> >> >> >>    From our point of view we would ultimately like to use such an 
> >> >> >> action to
> >> >> >>    offload to hardware. But it seems that there might be use-cases 
> >> >> >> (not the
> >> >> >>    one that I have rough code for) where such an action may be 
> >> >> >> useful. For
> >> >> >>    example to allow parts of IPVS to be used to provide stateful load
> >> >> >>    balancing.
> >> >> >>
> >> >> >>    Put another: It doesn't seem that a select group action is 
> >> >> >> dependent on
> >> >> >>    an offloads tough there are cases where they could be used 
> >> >> >> together.
> >> >> >
> >> >> > I agree that this is orthogonal to offloading and seems fine to do
> >> >> > now. It seems particularly nice if we can use IPVS in a clean way,
> >> >> > similar to what is currently being worked on for connection tracking.
> >> >> >
> >> >> > I guess I'm not entirely sure how you plan to offload this to hardware
> >> >> > so it's hard to say how it would intersect in the future. However, the
> >> >> > current plan is to have offloading be directed for a higher point
> >> >> > (i.e. userspace) and have the OVS kernel module remain a software path
> >> >> > so probably it doesn't really matter.
> >> >> >
> >> >> > However, I'll Pravin comment since he'll be the one reviewing the 
> >> >> > code.
> >> >
> >> > Ok, my reading of the recent offload thread, which is somewhat clouded by
> >> > preconceptions, is that offloads could be handled by hooks in the 
> >> > datapath.
> >> > But I understand other ideas are also under discussion. Indeed it is
> >> > more clear to me that other ideas are under discussion now that you have
> >> > pointed it out. Thanks.
> >>
> >> I'm curious about about what exactly you are trying to offload though.
> >> Is it the actual group selection operation? Is it the whole datapath
> >> and these use cases happen to contain groups?
> >
> > We would like to offload the entire flow.
> > And that would include the group selection operation.
> > The reason for this is to avoid the extra flow setup-cost
> > involved when selection occurs in user-space.
> >
> > Although it is conceivable that an entire datapath could be offloaded
> > by a Netronome flow processor I think it is more practical to allow
> > offloads with a smaller granularity: for instance some actions may
> > not yet be implemented in code that runs on the flow processor.
> >
> >> >> I agree it is good to integrate datapath with IPVS. I would like to
> >> >> see the design proposal.
> >> >
> >> > So far I have got as far as a prototype select group action for the
> >> > datapath. In its current incarnation it just implements a hash,
> >> > using the RSS hash.
> >> >
> >> > The attributes of a select group action are one or more nested
> >> > bucket attributes. And bucket attributes contain a weight attribute
> >> > and nested action attributes. I have it in mind to add a selection method
> >> > attribute to the selection group attribute, as per my proposal for Open
> >> > Flow[1].
> >> >
> >> > As such the current hash usage of a hash to select buckets is not
> >> > particularly important as I would like to support provision of
> >> > implementations of multiple selection methods.
> >> >
> >> > I have not yet fleshed out an IPVS proposal. But my general idea
> >> > is that when the datapath executes a select group action for
> >> > an IPVS group that it would call the IPVS scheduler (the IPVS term
> >> > for its connection tracker) to determine where to forward a packet.
> >> >
> >> > On this IPVS side this would probably require adding support for zones,
> >> > so that the entries relating to OVS would be separate from anything
> >> > else it is doing.
> >>
> >> This sounds an awful lot like a cross between how bonding is
> >> implemented (which I think is pretty much the same as the RSS hash
> >> backed version that you describe above) and an IPVS version of the
> >> connection tracking proposal that Justin sent out recently. Both of
> >> these use recirculation as the "select group" action.
> >
> > Yes, now you mention it, there are large similarities.
> >
> > From my point of view the conntrack proposal I am familiar with
> > lacks the ability for the connection tracker to return details
> > of the selected end-point to user-space. But I think that could be resolved.
> >
> >> I know you said that this might lead to a large number of flows post
> >> selection but I'm not sure why this is inherently true (or that it can't
> >> be mitigated).
> >
> > The scenario I am thinking of is something like this:
> >
> > 1. Pre-recirculation flow
> >    * match:   ip/0xffff,proto=ip_dst=a.b.c.d/255.255.255.255,tp_dst=80/0xfff
> >    * actions: ...,recirc_id=X,recirculate
> >
> > 2. Post-recirculation flow
> >
> >    Supposing that stateful L4 load balancing is used as the selection 
> > method.
> >    It seems to me that the resulting flow would need to to an exact
> >    match on all fields of the 5-tuple.
> >
> >    e.g.:
> >    * match: 
> > recirc_id=X,proto=ip/0xffff,ip_dst=a.b.c.d/255.255.255.255,ip_src=e.f.g.h/255.255.255.255,tp_dst=80/0xfff,tp_dst=p/0xffff
> >    * actions: output:3
> >
> >    So I see that there would basically need to be a post-recirculation flow
> >    for each connection. Each of which would need to be established via an
> >    upcall. This is what I meant by a large number of flows post selection.
> >
> >    It is not obvious to me how this can be mitigated other than by
> >    having a selection algorithm that lends itself to masking
> >    by for example basing its end-point selection a masked ip_dst.
> >    But I believe such an approach would lead to uneven balancing.
> >
> >    It would be possible to not match on proto, ip_dst and ip_src in
> >    the post-recirculation flow, as this is redundant due to the match on
> >    recirc_id. But I don't think that alters the number of flows that would
> >    be created.
> 
> Hmm, I guess that what you would really want is something that would
> return a connection ID and converting the raw output of IPVS into an
> ID is presumably roughly along the lines of what you were thinking for
> a group selection action.

I think in the case of IPVS a connection-ID based scheme could work.  And
it seems to me that it may be general enough to handle arbitrary load
balancing schemes. Though in order to do so it might be necessary to ensure
user-space maps connection-IDs to buckets consistently. At least so long as
the buckets of a group are consistent.

> One issue is it seems like there is a tension between minimizing flow
> setups and returning details of what was selected to userspace,
> particularly since the flow is how we have generally returned
> information.

Yes, I agree there seems to be some kind of tension there.

>From my point of view this flow setup cost is key to the effectiveness
of offloading such actions if offloads are to occur via the datapath.
And we would like to offload load balancing.

I realise there is some discussion going on in various forums if
offloads should be driven from the datapath or from user-space.
In order to further that and this discussion I plan to post a prototype
datapath select action (as I finally have the code in a form ready to
post).
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to