On Sat, Jun 22, 2013 at 5:32 AM, Andy Zhou <az...@nicira.com> wrote:
> For architectures can load and store unaligned long efficiently, use 4
> or 8 bytes operations. This improves the efficiency compare to byte wise
> operations.
>
> This patch is uses ideas and code from a patch submitted by Peter Klausler
> titled "replace memcmp() with specialized comparator". The flow compare
> function is essentially his implementation.  The original patch
> mentioned 7X speed up with this optimization.
>
> Co-authored-by: Peter Klausler <p...@google.com>
> Signed-off-by: Andy Zhou <az...@nicira.com>

OK, I think the time has come for this patch...

> diff --git a/datapath/flow.c b/datapath/flow.c
> index 39de931..273cbea 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
> @@ -343,16 +350,26 @@ static void flow_key_mask(struct sw_flow_key *dst,
>                           const struct sw_flow_key *src,
>                           const struct sw_flow_mask *mask)
>  {
> -       u8 *m = (u8 *)&mask->key + mask->range.start;
> -       u8 *s = (u8 *)src + mask->range.start;
> -       u8 *d = (u8 *)dst + mask->range.start;
> -       int i;
> +       const u8 *m = (u8 *)&mask->key;
> +       const u8 *s = (u8 *)src;
> +       u8 *d = (u8 *)dst;
> +       int len = sizeof(*dst);

What's the rationale for dropping the mask->range calculations here?
It shouldn't cause a problem but I think that offset should always be
aligned. Since we also use the full length this ends up pulling in
extra data on both sides.

I wonder if it makes sense to just force a common length for all of
our operations so there isn't a potential for mismatch. That might
potentially also eliminate the need to do any tail checking depending
on the length we choose.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to