On Sat, Jun 22, 2013 at 5:32 AM, Andy Zhou <az...@nicira.com> wrote: > For architectures can load and store unaligned long efficiently, use 4 > or 8 bytes operations. This improves the efficiency compare to byte wise > operations. > > This patch is uses ideas and code from a patch submitted by Peter Klausler > titled "replace memcmp() with specialized comparator". The flow compare > function is essentially his implementation. The original patch > mentioned 7X speed up with this optimization. > > Co-authored-by: Peter Klausler <p...@google.com> > Signed-off-by: Andy Zhou <az...@nicira.com>
OK, I think the time has come for this patch... > diff --git a/datapath/flow.c b/datapath/flow.c > index 39de931..273cbea 100644 > --- a/datapath/flow.c > +++ b/datapath/flow.c > @@ -343,16 +350,26 @@ static void flow_key_mask(struct sw_flow_key *dst, > const struct sw_flow_key *src, > const struct sw_flow_mask *mask) > { > - u8 *m = (u8 *)&mask->key + mask->range.start; > - u8 *s = (u8 *)src + mask->range.start; > - u8 *d = (u8 *)dst + mask->range.start; > - int i; > + const u8 *m = (u8 *)&mask->key; > + const u8 *s = (u8 *)src; > + u8 *d = (u8 *)dst; > + int len = sizeof(*dst); What's the rationale for dropping the mask->range calculations here? It shouldn't cause a problem but I think that offset should always be aligned. Since we also use the full length this ends up pulling in extra data on both sides. I wonder if it makes sense to just force a common length for all of our operations so there isn't a potential for mismatch. That might potentially also eliminate the need to do any tail checking depending on the length we choose. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev