On Thu, Aug 28, 2014 at 11:28:41AM -0700, Ben Pfaff wrote:
> On Fri, Aug 22, 2014 at 01:58:21PM -0700, Jarno Rajahalme wrote:
> > The atomics here do not synchronize the state of any other variables,
> > so we can use relaxed atomics.
> > 
> > cfm_should_process_flow() is rearranged to set the megaflow mask bits
> > only if necessary, and to avoid the atomic operation on non-BFD
> > packets.
> > 
> > Signed-off-by: Jarno Rajahalme <[email protected]>
> > +    /* Most packets are not CFM. */
> > +    if (OVS_LIKELY(ntohs(flow->dl_type) != ETH_TYPE_CFM)) {
> > +        return false;
> > +    }
> 
> The common recommendation is to htons the constant instead of ntohsing
> the variable, to generate better code.  Out of curiosity, I decided to
> find out whether this is still needed with modern compilers, so I
> built the following with GCC and Clang:
> 
>     #include <netinet/in.h>
>     #include <stdbool.h>
> 
>     bool
>     is_ip(unsigned short int x)
>     {
>         return ntohs(x) == 0x0800;
>     }
> 
>     bool
>     is_ip2(unsigned short int x)
>     {
>         return x == htons(0x0800);
>     }
> 
> With GCC 4.7.2:
> 
>     00000000 <is_ip>:
>        0:     0f b7 44 24 04          movzwl 0x4(%esp),%eax
>        5:     66 c1 c8 08             ror    $0x8,%ax
>        9:     66 3d 00 08             cmp    $0x800,%ax
>        d:     0f 94 c0                sete   %al
>       10:     c3                      ret    
> 
>     00000020 <is_ip2>:
>       20:     66 83 7c 24 04 08       cmpw   $0x8,0x4(%esp)
>       26:     0f 94 c0                sete   %al
>       29:     c3                      ret    
> 
> With Clang 3.5.0:
> 
>     00000000 <is_ip>:
>        0:     66 8b 44 24 04          mov    0x4(%esp),%ax
>        5:     66 c1 c0 08             rol    $0x8,%ax
>        9:     0f b7 c0                movzwl %ax,%eax
>        c:     3d 00 08 00 00          cmp    $0x800,%eax
>       11:     0f 94 c0                sete   %al
>       14:     0f b6 c0                movzbl %al,%eax
>       17:     c3                      ret    
> 
>     00000020 <is_ip2>:
>       20:     0f b7 44 24 04          movzwl 0x4(%esp),%eax
>       25:     83 f8 08                cmp    $0x8,%eax
>       28:     0f 94 c0                sete   %al
>       2b:     0f b6 c0                movzbl %al,%eax
>       2e:     c3                      ret    
> 
> so it's still worthwhile.  (I was surprised to see Clang generate
> worse code overall.  It seems to regard a bool as a 32-bit value, at
> least for the purpose of returning from a function.)
> 
> > +    atomic_read_relaxed(&cfm->check_tnl_key, &check_tnl_key);
> > +
> > +    /* Atomic read is always a memory access, do some work
> > +     * before we need the result. */
> 
> I believe an atomic_read_relaxed() is just a compiler barrier.  That
> means that unless cfm_should_process_flow() is getting inlined
> somewhere (I doubt it--it has no callers within cfm.c), there isn't
> any way that the memory access could be optimized out anyway.  So,
> this optimization and comment seems a little silly to me.

Oh, also:
Acked-by: Ben Pfaff <[email protected]>
since none of this really matters.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

Reply via email to