On Fri, Aug 22, 2014 at 01:58:21PM -0700, Jarno Rajahalme wrote:
> The atomics here do not synchronize the state of any other variables,
> so we can use relaxed atomics.
> 
> cfm_should_process_flow() is rearranged to set the megaflow mask bits
> only if necessary, and to avoid the atomic operation on non-BFD
> packets.
> 
> Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com>
> +    /* Most packets are not CFM. */
> +    if (OVS_LIKELY(ntohs(flow->dl_type) != ETH_TYPE_CFM)) {
> +        return false;
> +    }

The common recommendation is to htons the constant instead of ntohsing
the variable, to generate better code.  Out of curiosity, I decided to
find out whether this is still needed with modern compilers, so I
built the following with GCC and Clang:

    #include <netinet/in.h>
    #include <stdbool.h>

    bool
    is_ip(unsigned short int x)
    {
        return ntohs(x) == 0x0800;
    }

    bool
    is_ip2(unsigned short int x)
    {
        return x == htons(0x0800);
    }

With GCC 4.7.2:

    00000000 <is_ip>:
       0:       0f b7 44 24 04          movzwl 0x4(%esp),%eax
       5:       66 c1 c8 08             ror    $0x8,%ax
       9:       66 3d 00 08             cmp    $0x800,%ax
       d:       0f 94 c0                sete   %al
      10:       c3                      ret    

    00000020 <is_ip2>:
      20:       66 83 7c 24 04 08       cmpw   $0x8,0x4(%esp)
      26:       0f 94 c0                sete   %al
      29:       c3                      ret    

With Clang 3.5.0:

    00000000 <is_ip>:
       0:       66 8b 44 24 04          mov    0x4(%esp),%ax
       5:       66 c1 c0 08             rol    $0x8,%ax
       9:       0f b7 c0                movzwl %ax,%eax
       c:       3d 00 08 00 00          cmp    $0x800,%eax
      11:       0f 94 c0                sete   %al
      14:       0f b6 c0                movzbl %al,%eax
      17:       c3                      ret    

    00000020 <is_ip2>:
      20:       0f b7 44 24 04          movzwl 0x4(%esp),%eax
      25:       83 f8 08                cmp    $0x8,%eax
      28:       0f 94 c0                sete   %al
      2b:       0f b6 c0                movzbl %al,%eax
      2e:       c3                      ret    

so it's still worthwhile.  (I was surprised to see Clang generate
worse code overall.  It seems to regard a bool as a 32-bit value, at
least for the purpose of returning from a function.)

> +    atomic_read_relaxed(&cfm->check_tnl_key, &check_tnl_key);
> +
> +    /* Atomic read is always a memory access, do some work
> +     * before we need the result. */

I believe an atomic_read_relaxed() is just a compiler barrier.  That
means that unless cfm_should_process_flow() is getting inlined
somewhere (I doubt it--it has no callers within cfm.c), there isn't
any way that the memory access could be optimized out anyway.  So,
this optimization and comment seems a little silly to me.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to