On Thu, Aug 28, 2014 at 11:28:41AM -0700, Ben Pfaff wrote: > On Fri, Aug 22, 2014 at 01:58:21PM -0700, Jarno Rajahalme wrote: > > The atomics here do not synchronize the state of any other variables, > > so we can use relaxed atomics. > > > > cfm_should_process_flow() is rearranged to set the megaflow mask bits > > only if necessary, and to avoid the atomic operation on non-BFD > > packets. > > > > Signed-off-by: Jarno Rajahalme <[email protected]> > > + /* Most packets are not CFM. */ > > + if (OVS_LIKELY(ntohs(flow->dl_type) != ETH_TYPE_CFM)) { > > + return false; > > + } > > The common recommendation is to htons the constant instead of ntohsing > the variable, to generate better code. Out of curiosity, I decided to > find out whether this is still needed with modern compilers, so I > built the following with GCC and Clang: > > #include <netinet/in.h> > #include <stdbool.h> > > bool > is_ip(unsigned short int x) > { > return ntohs(x) == 0x0800; > } > > bool > is_ip2(unsigned short int x) > { > return x == htons(0x0800); > } > > With GCC 4.7.2: > > 00000000 <is_ip>: > 0: 0f b7 44 24 04 movzwl 0x4(%esp),%eax > 5: 66 c1 c8 08 ror $0x8,%ax > 9: 66 3d 00 08 cmp $0x800,%ax > d: 0f 94 c0 sete %al > 10: c3 ret > > 00000020 <is_ip2>: > 20: 66 83 7c 24 04 08 cmpw $0x8,0x4(%esp) > 26: 0f 94 c0 sete %al > 29: c3 ret > > With Clang 3.5.0: > > 00000000 <is_ip>: > 0: 66 8b 44 24 04 mov 0x4(%esp),%ax > 5: 66 c1 c0 08 rol $0x8,%ax > 9: 0f b7 c0 movzwl %ax,%eax > c: 3d 00 08 00 00 cmp $0x800,%eax > 11: 0f 94 c0 sete %al > 14: 0f b6 c0 movzbl %al,%eax > 17: c3 ret > > 00000020 <is_ip2>: > 20: 0f b7 44 24 04 movzwl 0x4(%esp),%eax > 25: 83 f8 08 cmp $0x8,%eax > 28: 0f 94 c0 sete %al > 2b: 0f b6 c0 movzbl %al,%eax > 2e: c3 ret > > so it's still worthwhile. (I was surprised to see Clang generate > worse code overall. It seems to regard a bool as a 32-bit value, at > least for the purpose of returning from a function.) > > > + atomic_read_relaxed(&cfm->check_tnl_key, &check_tnl_key); > > + > > + /* Atomic read is always a memory access, do some work > > + * before we need the result. */ > > I believe an atomic_read_relaxed() is just a compiler barrier. That > means that unless cfm_should_process_flow() is getting inlined > somewhere (I doubt it--it has no callers within cfm.c), there isn't > any way that the memory access could be optimized out anyway. So, > this optimization and comment seems a little silly to me.
Oh, also: Acked-by: Ben Pfaff <[email protected]> since none of this really matters. _______________________________________________ dev mailing list [email protected] http://openvswitch.org/mailman/listinfo/dev
