On Sun, Sep 7, 2014 at 7:18 PM, Simon Horman <simon.hor...@netronome.com> wrote:
> On Fri, Sep 05, 2014 at 12:07:17PM -0700, Jesse Gross wrote:
>> On Thu, Sep 4, 2014 at 12:28 AM, Simon Horman
>> <simon.hor...@netronome.com> wrote:
>> > On Tue, Sep 02, 2014 at 07:20:30PM -0700, Pravin Shelar wrote:
>> >> On Tue, Sep 2, 2014 at 6:55 PM, Jesse Gross <je...@nicira.com> wrote:
>> >> > On Mon, Sep 1, 2014 at 1:10 AM, Simon Horman 
>> >> > <simon.hor...@netronome.com> wrote:
>> >> >> On Thu, Aug 28, 2014 at 10:12:49AM +0900, Simon Horman wrote:
>> >> >>> On Wed, Aug 27, 2014 at 03:03:53PM -0500, Jesse Gross wrote:
>> >> >>> > On Wed, Aug 27, 2014 at 11:51 AM, Ben Pfaff <b...@nicira.com> wrote:
>> >> >>> > > On Wed, Aug 27, 2014 at 10:26:14AM +0900, Simon Horman wrote:
>> >> >>> > >> On Fri, Aug 22, 2014 at 08:30:08AM -0700, Ben Pfaff wrote:
>> >> >>> > >> > On Fri, Aug 22, 2014 at 09:19:41PM +0900, Simon Horman wrote:
>> >> >>> > >> What we would like to do is to provide something generally useful
>> >> >>> > >> which may be used as appropriate to:
>> >> >>> > >
>> >> >>> > > I'm going to skip past these ideas, which do sound interesting, 
>> >> >>> > > because
>> >> >>> > > I think that they're more for Pravin and Jesse than for me.  I 
>> >> >>> > > hope that
>> >> >>> > > they will provide some reactions to them.
>> >> >>> >
>> >> >>> > For the hardware offloading piece in particular, I would take a look
>> >> >>> > at the discussion that has been going on in the netdev mailing 
>> >> >>> > list. I
>> >> >>> > think the general consensus (to the extent that there is one) is 
>> >> >>> > that
>> >> >>> > the hardware offload interface should be a block outside of OVS and
>> >> >>> > then OVS (mostly likely from userspace) configures it.
>> >> >>>
>> >> >>> Thanks, I am now digesting that conversation.
>> >> >>
>> >> >> A lively conversation indeed.
>> >> >>
>> >> >> We are left with two questions for you:
>> >> >>
>> >> >> 1. Would you look at a proposal (I have some rough code that even 
>> >> >> works)
>> >> >>    for a select group action in the datapath prior to the finalisation
>> >> >>    of the question of offloads infrastructure in the kernel?
>> >> >>
>> >> >>    From our point of view we would ultimately like to use such an 
>> >> >> action to
>> >> >>    offload to hardware. But it seems that there might be use-cases 
>> >> >> (not the
>> >> >>    one that I have rough code for) where such an action may be useful. 
>> >> >> For
>> >> >>    example to allow parts of IPVS to be used to provide stateful load
>> >> >>    balancing.
>> >> >>
>> >> >>    Put another: It doesn't seem that a select group action is 
>> >> >> dependent on
>> >> >>    an offloads tough there are cases where they could be used together.
>> >> >
>> >> > I agree that this is orthogonal to offloading and seems fine to do
>> >> > now. It seems particularly nice if we can use IPVS in a clean way,
>> >> > similar to what is currently being worked on for connection tracking.
>> >> >
>> >> > I guess I'm not entirely sure how you plan to offload this to hardware
>> >> > so it's hard to say how it would intersect in the future. However, the
>> >> > current plan is to have offloading be directed for a higher point
>> >> > (i.e. userspace) and have the OVS kernel module remain a software path
>> >> > so probably it doesn't really matter.
>> >> >
>> >> > However, I'll Pravin comment since he'll be the one reviewing the code.
>> >
>> > Ok, my reading of the recent offload thread, which is somewhat clouded by
>> > preconceptions, is that offloads could be handled by hooks in the datapath.
>> > But I understand other ideas are also under discussion. Indeed it is
>> > more clear to me that other ideas are under discussion now that you have
>> > pointed it out. Thanks.
>>
>> I'm curious about about what exactly you are trying to offload though.
>> Is it the actual group selection operation? Is it the whole datapath
>> and these use cases happen to contain groups?
>
> We would like to offload the entire flow.
> And that would include the group selection operation.
> The reason for this is to avoid the extra flow setup-cost
> involved when selection occurs in user-space.
>
> Although it is conceivable that an entire datapath could be offloaded
> by a Netronome flow processor I think it is more practical to allow
> offloads with a smaller granularity: for instance some actions may
> not yet be implemented in code that runs on the flow processor.
>
>> >> I agree it is good to integrate datapath with IPVS. I would like to
>> >> see the design proposal.
>> >
>> > So far I have got as far as a prototype select group action for the
>> > datapath. In its current incarnation it just implements a hash,
>> > using the RSS hash.
>> >
>> > The attributes of a select group action are one or more nested
>> > bucket attributes. And bucket attributes contain a weight attribute
>> > and nested action attributes. I have it in mind to add a selection method
>> > attribute to the selection group attribute, as per my proposal for Open
>> > Flow[1].
>> >
>> > As such the current hash usage of a hash to select buckets is not
>> > particularly important as I would like to support provision of
>> > implementations of multiple selection methods.
>> >
>> > I have not yet fleshed out an IPVS proposal. But my general idea
>> > is that when the datapath executes a select group action for
>> > an IPVS group that it would call the IPVS scheduler (the IPVS term
>> > for its connection tracker) to determine where to forward a packet.
>> >
>> > On this IPVS side this would probably require adding support for zones,
>> > so that the entries relating to OVS would be separate from anything
>> > else it is doing.
>>
>> This sounds an awful lot like a cross between how bonding is
>> implemented (which I think is pretty much the same as the RSS hash
>> backed version that you describe above) and an IPVS version of the
>> connection tracking proposal that Justin sent out recently. Both of
>> these use recirculation as the "select group" action.
>
> Yes, now you mention it, there are large similarities.
>
> From my point of view the conntrack proposal I am familiar with
> lacks the ability for the connection tracker to return details
> of the selected end-point to user-space. But I think that could be resolved.
>
>> I know you said that this might lead to a large number of flows post
>> selection but I'm not sure why this is inherently true (or that it can't
>> be mitigated).
>
> The scenario I am thinking of is something like this:
>
> 1. Pre-recirculation flow
>    * match:   ip/0xffff,proto=ip_dst=a.b.c.d/255.255.255.255,tp_dst=80/0xfff
>    * actions: ...,recirc_id=X,recirculate
>
> 2. Post-recirculation flow
>
>    Supposing that stateful L4 load balancing is used as the selection method.
>    It seems to me that the resulting flow would need to to an exact
>    match on all fields of the 5-tuple.
>
>    e.g.:
>    * match: 
> recirc_id=X,proto=ip/0xffff,ip_dst=a.b.c.d/255.255.255.255,ip_src=e.f.g.h/255.255.255.255,tp_dst=80/0xfff,tp_dst=p/0xffff
>    * actions: output:3
>
>    So I see that there would basically need to be a post-recirculation flow
>    for each connection. Each of which would need to be established via an
>    upcall. This is what I meant by a large number of flows post selection.
>
>    It is not obvious to me how this can be mitigated other than by
>    having a selection algorithm that lends itself to masking
>    by for example basing its end-point selection a masked ip_dst.
>    But I believe such an approach would lead to uneven balancing.
>
>    It would be possible to not match on proto, ip_dst and ip_src in
>    the post-recirculation flow, as this is redundant due to the match on
>    recirc_id. But I don't think that alters the number of flows that would
>    be created.

Hmm, I guess that what you would really want is something that would
return a connection ID and converting the raw output of IPVS into an
ID is presumably roughly along the lines of what you were thinking for
a group selection action.

One issue is it seems like there is a tension between minimizing flow
setups and returning details of what was selected to userspace,
particularly since the flow is how we have generally returned
information.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to