On Wed, Aug 14, 2013 at 11:50 AM, Andy Zhou <az...@nicira.com> wrote:
>
>
>
> On Wed, Aug 14, 2013 at 11:29 AM, Jesse Gross <je...@nicira.com> wrote:
>>
>> On Sat, Jun 22, 2013 at 5:32 AM, Andy Zhou <az...@nicira.com> wrote:
>> > For architectures can load and store unaligned long efficiently, use 4
>> > or 8 bytes operations. This improves the efficiency compare to byte wise
>> > operations.
>> >
>> > This patch is uses ideas and code from a patch submitted by Peter
>> > Klausler
>> > titled "replace memcmp() with specialized comparator". The flow compare
>> > function is essentially his implementation.  The original patch
>> > mentioned 7X speed up with this optimization.
>> >
>> > Co-authored-by: Peter Klausler <p...@google.com>
>> > Signed-off-by: Andy Zhou <az...@nicira.com>
>>
>> OK, I think the time has come for this patch...
>>
>> > diff --git a/datapath/flow.c b/datapath/flow.c
>> > index 39de931..273cbea 100644
>> > --- a/datapath/flow.c
>> > +++ b/datapath/flow.c
>> > @@ -343,16 +350,26 @@ static void flow_key_mask(struct sw_flow_key *dst,
>> >                           const struct sw_flow_key *src,
>> >                           const struct sw_flow_mask *mask)
>> >  {
>> > -       u8 *m = (u8 *)&mask->key + mask->range.start;
>> > -       u8 *s = (u8 *)src + mask->range.start;
>> > -       u8 *d = (u8 *)dst + mask->range.start;
>> > -       int i;
>> > +       const u8 *m = (u8 *)&mask->key;
>> > +       const u8 *s = (u8 *)src;
>> > +       u8 *d = (u8 *)dst;
>> > +       int len = sizeof(*dst);
>>
>> What's the rationale for dropping the mask->range calculations here?
>> It shouldn't cause a problem but I think that offset should always be
>> aligned. Since we also use the full length this ends up pulling in
>> extra data on both sides.
>>
> I just figure the added cost is not that big with the optimized version.
> How about we add it back in to complete this patch.
>
>> I wonder if it makes sense to just force a common length for all of
>> our operations so there isn't a potential for mismatch. That might
>> potentially also eliminate the need to do any tail checking depending
>> on the length we choose.
>
>  Yes, this has the advantage of working well on CPUs requires aligned access
> and make the code even simpler.  Let's flush out the details.

I think probably the first step to both of these is figuring out
whether the length optimization still makes sense. If it does (and it
would be good to think about the possibility of IPv6 tunneling as
well) then presumably we should take advantage of it everywhere. If it
doesn't then we shouldn't bother with tracking it. That would also
answer the second question since we could just force alignment of
struct sw_flow_key to be that of a long.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to