Pushed to master, Some comments below,
Jarno On Mar 24, 2014, at 10:26 AM, Pravin Shelar <pshe...@nicira.com> wrote: > On Fri, Feb 21, 2014 at 11:41 AM, Jarno Rajahalme <jrajaha...@nicira.com> > wrote: >> Minimize padding in sw_flow_key and move 'tp' top the main struct. >> These changes simplify code when accessing the transport port numbers >> and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit >> systems (128->120 bytes). These changes also make the keys for IPv4 >> packets to fit in one cache line. >> >> There is a valid concern for safety of packing the struct >> ovs_key_ipv4_tunnel, as it would be possible to take the address of >> the tun_id member as a __be64 * which could result in unaligned access >> in some systems. However: >> >> - sw_flow_key itself is 64-bit aligned, so the tun_id within is always >> 64-bit aligned. >> - We never make arrays of ovs_key_ipv4_tunnel (which would force every >> second tun_key to be misaligned). >> - We never take the address of the tun_id in to a __be64 *. >> - Whereever we use struct ovs_key_ipv4_tunnel outside the sw_flow_key, >> it is in stack (on tunnel input functions), where compiler has full >> control of the alignment. >> >> Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com> >> --- >> datapath/datapath.c | 6 +++ >> datapath/flow.c | 44 ++++++++----------- >> datapath/flow.h | 29 +++++------- >> datapath/flow_netlink.c | 112 >> ++++++++++++++--------------------------------- >> 4 files changed, 68 insertions(+), 123 deletions(-) >> >> diff --git a/datapath/datapath.c b/datapath/datapath.c >> index 130300f..8a2c0af 100644 >> --- a/datapath/datapath.c >> +++ b/datapath/datapath.c >> @@ -1929,6 +1929,12 @@ static int __init dp_init(void) >> pr_info("Open vSwitch switching datapath %s, built "__DATE__" >> "__TIME__"\n", >> VERSION); >> >> + pr_info("Datapath sw_flow_key size: %ld bytes. ip.frag at %ld, >> tp.flags at %ld, ipv4.addr at %ld\n", >> + sizeof(struct sw_flow_key), >> + offsetof(struct sw_flow_key, ip.frag), >> + offsetof(struct sw_flow_key, tp.flags), >> + offsetof(struct sw_flow_key, ipv4.addr)); >> + >> err = ovs_flow_init(); >> if (err) >> goto error; >> diff --git a/datapath/flow.c b/datapath/flow.c >> index 3cc4cdf..4e37e9b 100644 >> --- a/datapath/flow.c >> +++ b/datapath/flow.c >> @@ -65,17 +65,11 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies) >> void ovs_flow_stats_update(struct sw_flow *flow, struct sk_buff *skb) >> { >> struct flow_stats *stats; >> - __be16 tcp_flags = 0; >> + __be16 tcp_flags = flow->key.tp.flags; >> int node = numa_node_id(); >> >> stats = rcu_dereference(flow->stats[node]); >> >> - if (likely(flow->key.ip.proto == IPPROTO_TCP)) { >> - if (likely(flow->key.eth.type == htons(ETH_P_IP))) >> - tcp_flags = flow->key.ipv4.tp.flags; >> - else if (likely(flow->key.eth.type == htons(ETH_P_IPV6))) >> - tcp_flags = flow->key.ipv6.tp.flags; >> - } > nice :) > >> /* Check if already have node-specific stats. */ >> if (likely(stats)) { >> spin_lock(&stats->lock); >> @@ -358,8 +352,8 @@ static int parse_icmpv6(struct sk_buff *skb, struct >> sw_flow_key *key, >> /* The ICMPv6 type and code fields use the 16-bit transport port >> * fields, so we need to store them in 16-bit network byte order. >> */ >> - key->ipv6.tp.src = htons(icmp->icmp6_type); >> - key->ipv6.tp.dst = htons(icmp->icmp6_code); >> + key->tp.src = htons(icmp->icmp6_type); >> + key->tp.dst = htons(icmp->icmp6_code); >> >> if (icmp->icmp6_code == 0 && >> (icmp->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION || >> @@ -520,21 +514,21 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, >> struct sw_flow_key *key) >> if (key->ip.proto == IPPROTO_TCP) { >> if (tcphdr_ok(skb)) { >> struct tcphdr *tcp = tcp_hdr(skb); >> - key->ipv4.tp.src = tcp->source; >> - key->ipv4.tp.dst = tcp->dest; >> - key->ipv4.tp.flags = TCP_FLAGS_BE16(tcp); >> + key->tp.src = tcp->source; >> + key->tp.dst = tcp->dest; >> + key->tp.flags = TCP_FLAGS_BE16(tcp); >> } > Do u think we can directly copy 32 bits from packet here ? > You mean the ports? Do we have alignment guarantees for the ports in the packet, or some abstraction for accessing a possibly unaligned be32 in the kernel? … >> diff --git a/datapath/flow.h b/datapath/flow.h >> index 270a324..5587577 100644 >> --- a/datapath/flow.h >> +++ b/datapath/flow.h >> @@ -49,7 +49,7 @@ struct ovs_key_ipv4_tunnel { >> __be16 tun_flags; >> u8 ipv4_tos; >> u8 ipv4_ttl; >> -}; >> +} __packed __aligned(4); /* Minimize padding. */ >> >> static inline void ovs_flow_tun_key_init(struct ovs_key_ipv4_tunnel *tun_key, >> const struct iphdr *iph, __be64 >> tun_id, >> @@ -73,7 +73,7 @@ struct sw_flow_key { >> u32 priority; /* Packet QoS priority. */ >> u32 skb_mark; /* SKB mark. */ >> u16 in_port; /* Input switch port (or >> DP_MAX_PORTS). */ >> - } phy; >> + } __packed phy; /* Safe when right after 'tun_key'. */ >> struct { >> u8 src[ETH_ALEN]; /* Ethernet source address. */ >> u8 dst[ETH_ALEN]; /* Ethernet destination address. */ >> @@ -86,23 +86,21 @@ struct sw_flow_key { >> u8 ttl; /* IP TTL/hop limit. */ >> u8 frag; /* One of OVS_FRAG_TYPE_*. */ >> } ip; >> + struct { >> + __be16 src; /* TCP/UDP/SCTP source port. */ >> + __be16 dst; /* TCP/UDP/SCTP destination port. */ >> + __be16 flags; /* TCP flags. */ >> + } tp; >> union { >> struct { >> struct { >> __be32 src; /* IP source address. */ >> __be32 dst; /* IP destination address. */ >> } addr; >> - union { >> - struct { >> - __be16 src; /* >> TCP/UDP/SCTP source port. */ >> - __be16 dst; /* >> TCP/UDP/SCTP destination port. */ >> - __be16 flags; /* TCP >> flags. */ >> - } tp; >> - struct { >> - u8 sha[ETH_ALEN]; /* ARP >> source hardware address. */ >> - u8 tha[ETH_ALEN]; /* ARP >> target hardware address. */ >> - } arp; >> - }; >> + struct { >> + u8 sha[ETH_ALEN]; /* ARP source >> hardware address. */ >> + u8 tha[ETH_ALEN]; /* ARP target >> hardware address. */ >> + } arp; >> } ipv4; >> struct { >> struct { >> @@ -111,11 +109,6 @@ struct sw_flow_key { >> } addr; >> __be32 label; /* IPv6 flow label. */ >> struct { >> - __be16 src; /* TCP/UDP/SCTP >> source port. */ >> - __be16 dst; /* TCP/UDP/SCTP >> destination port. */ >> - __be16 flags; /* TCP flags. */ >> - } tp; >> - struct { >> struct in6_addr target; /* ND target address. >> */ >> u8 sll[ETH_ALEN]; /* ND source link >> layer address. */ >> u8 tll[ETH_ALEN]; /* ND target link >> layer address. */ > > This change increases range of key for mega flow matching on IP > address but not beyond. But I think such flows are less likely. > IMO it might rather likely, e.g. in anything resembling IP forwarding. However, compared to the current definition, removal of padding by this patch more than offsets the effect of unnecessarily hashing the (zeroed) port number fields in this case. … > > looks good. > Acked-by: Pravin B Shelar <pshe...@nicira.com>
_______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev