Thanks for the review. On Thu, Jun 6, 2013 at 7:01 PM, Jesse Gross <je...@nicira.com> wrote:
> On Wed, Jun 5, 2013 at 10:46 PM, Andy Zhou <az...@nicira.com> wrote: > > Add mega flow support in kernel datapath. > > > > Pravin has made significant contributions to this patch. Including > > the mega flow id look up scheme, API clean ups, and bug fixes. > > > > Co-authored-by: Pravin B Shelar <pshe...@nicira.com> > > Signed-off-by: Pravin B Shelar <pshe...@nicira.com> > > Signed-off-by: Andy Zhou <az...@nicira.com> > > I would probably just squash the previous patch and this one together > since they really come together and the first one is so small. > > I received a sparse error after applying this patch: > CHECK /home/jgross/openvswitch/datapath/linux/flow.c > /home/jgross/openvswitch/datapath/linux/flow.c:179:40: warning: > restricted __be16 degrades to integer > > It looks like a legitimate problem, although one that will not > actually affect things in practice because of the value. > Agreed, will fix > > > diff --git a/datapath/datapath.c b/datapath/datapath.c > > index 42af315..98d78a8 100644 > > --- a/datapath/datapath.c > > +++ b/datapath/datapath.c > > @@ -372,13 +372,13 @@ static size_t key_attr_size(void) > > { > > return nla_total_size(4) /* OVS_KEY_ATTR_PRIORITY */ > > + nla_total_size(0) /* OVS_KEY_ATTR_TUNNEL */ > > - + nla_total_size(8) /* OVS_TUNNEL_KEY_ATTR_ID */ > > - + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC > */ > > - + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_DST > */ > > - + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TOS */ > > - + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TTL */ > > - + nla_total_size(0) /* > OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */ > > - + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_CSUM */ > > + + nla_total_size(8) /* OVS_TUNNEL_KEY_ATTR_ID */ > > + + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_SRC */ > > + + nla_total_size(4) /* OVS_TUNNEL_KEY_ATTR_IPV4_DST */ > > + + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TOS */ > > + + nla_total_size(1) /* OVS_TUNNEL_KEY_ATTR_TTL */ > > + + nla_total_size(0) /* > OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT */ > > + + nla_total_size(0) /* OVS_TUNNEL_KEY_ATTR_CSUM */ > > + nla_total_size(4) /* OVS_KEY_ATTR_IN_PORT */ > > + nla_total_size(4) /* OVS_KEY_ATTR_SKB_MARK */ > > + nla_total_size(12) /* OVS_KEY_ATTR_ETHERNET */ > > These are actually intentionally indented to show a grouping of the > tunnel sub-attributes. Regardless, we should also try to keep > unrelated changes out of a patch. > Did not know this is intentional. Will roll back the change. > > > @@ -1138,14 +1142,27 @@ static int ovs_flow_cmd_fill_info(struct sw_flow > *flow, struct datapath *dp, > [...] > > + nla = nla_nest_start(skb, OVS_FLOW_ATTR_MASK); > > + if (!nla) > > + goto nla_put_failure; > > + > > + err = ovs_flow_to_nlattrs(&flow->key, > > + &flow->mfm->key, skb); > > + if (err) > > + goto error; > > + > > + nla_nest_end(skb, nla); > > It looks to me like we don't allocate any extra space in the buffer > for the mask when called ovs_flow_cmd_build_info(). > Do you mean ovs_dp_cmd_msg_size() does not include the size of mask (and key) attributes? > > The indentation here also uses spaces instead of tabs. > Will fix. > > > @@ -1241,23 +1261,33 @@ static int ovs_flow_cmd_new_or_set(struct > sk_buff *skb, struct genl_info *info) > > error = -EINVAL; > > if (!a[OVS_FLOW_ATTR_KEY]) > > goto error; > > - error = ovs_flow_from_nlattrs(&key, &key_len, > a[OVS_FLOW_ATTR_KEY]); > > - if (error) > > - goto error; > > + > > + mfm = sw_flow_mask_alloc(); > > We should make sure that all symbols that are shared across > compilation units are prefixed with ovs_ so that we don't pollute the > global kernel namespace. > Will fix. > > I suspect that doing the allocation for the mask up front will result > in a lot of extra mallocs and frees on the flow setup path since most > of the time we will be hitting an existing match. Is there a way to > eliminate that? > Good point, I will implement this change. > > > + if (mfm == NULL) > > + return -ENOMEM; > > Just checking !mfm is probably more canonical. > Will fix. > > > + ovs_match_init(&match, &key, mfm); > > + error = ovs_match_from_nlattrs(&match, > > + a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK]); > > + if (error) { > > + goto err_kfree; > > + } > > Kernel style is not have brackets around since line if statements. > Will fix. > > > @@ -1267,7 +1297,23 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff > *skb, struct genl_info *info) > > goto err_unlock_ovs; > > > > table = ovsl_dereference(dp->table); > > - flow = ovs_flow_tbl_lookup(table, &key, key_len); > > + > > + /* Deduplicate the mega flow mask. Note ovs_mutex is held. */ > > + existing_mfm = sw_flow_mask_find(mfm); > > + if (existing_mfm) { > > + flow = ovs_flow_tbl_lookup(table, &key, existing_mfm, > key_len); > > + if (flow) { > > + if (!ovs_flow_cmp_id(flow, &key, key_len)) { > > + error = -EINVAL; > > + goto err_unlock_ovs; > > + } > > I believe this allows for inserting the same flow with multiple masks > but this isn't really good because the order of the masks (and > therefore priority) is somewhat arbitrary. If we just do an lookup > similar to how a packet is processed on the exact flow then it seems a > little more robust and means that we can eliminate the need to have a > second hash table. > So just look for any existing flow with key only. The mask attribute is only be used for creating a new flow? > > > @@ -1294,14 +1340,17 @@ static int ovs_flow_cmd_new_or_set(struct > sk_buff *skb, struct genl_info *info) > > } > > clear_stats(flow); > > > > + rcu_assign_pointer(flow->mfm, mfm); > > rcu_assign_pointer(flow->sf_acts, acts); > > > > /* Put flow in bucket. */ > > ovs_flow_tbl_insert(table, flow, &key, key_len); > > > > + if (existing_mfm == NULL) > > + sw_flow_mask_insert(mfm); > > Same (minor) comment here about the NULL check. > Will fix. > > > diff --git a/datapath/flow.h b/datapath/flow.h > > index dba66cf..9fb5a1c 100644 > > --- a/datapath/flow.h > > +++ b/datapath/flow.h > > @@ -117,9 +119,13 @@ struct sw_flow_key { > > struct sw_flow { > > struct rcu_head rcu; > > struct hlist_node hash_node[2]; > > + struct hlist_node hash_node_id[2]; > > I don't think there is a need to have two instances of hash_node_id > because it is only accessed under lock. The reason for having two in > packet lookup is to avoid a hiccup in traffic when we move a flow from > one table to another. However, there are no such concurrent accesses > in the lookup table. > It seems we are going down the path of the removing the hash_node_id entirely, right? > > > +struct sw_flow_desc { > > + size_t start; > > + size_t end; > > +}; > > I found the name "desc" to not be very intuitive - is the another name > that is more self-explanatory? > How about sw_flow_key_range? Any other suggestions? > > > +static inline u16 sw_flow_desc_roundup_size(const struct sw_flow_desc > *desc) > > +{ > > + u16 actual = sw_flow_desc_actual_size(desc); > > + u16 n32s = actual / sizeof(u32); > > + u16 roundup; > > + > > + roundup = (actual & 0x3) ? (n32s + 1) * sizeof(u32) : actual; > > + > > + return roundup; > > +} > > Is this the same as DIV_ROUND_UP? > I did not use it because DIV_ROUND_UP gives the roundup of u32, I needed the roundup in bytes. However, I could rewrite this function using DIV_ROUND_UP. > > > +bool ovs_match_validate(const struct sw_flow_match *match, > > + u64 key_attrs, u64 mask_attrs); > > + > > It looks like we don't use this function outside of flow.c, in which > case I think it could be made static. > Agreed. > > > struct flow_table { > > struct flex_array *buckets; > > + struct flex_array *buckets_ids; > > As I look at this further, I've become more and more convinced that we > really only want a single hash table. Otherwise, there are a lot of > questions about sizing, the need to expand, rehash, etc. > > I agree single hash bucket will work. > +struct sw_flow_mask { > > + struct sw_flow_desc desc; > > + struct sw_flow_key key; > > + struct list_head list; > > + struct rcu_head rcu; > > + struct kref kref; > > I probably wouldn't use kref here since the use of atomics makes it > look like there is some extra locking going on. In our case, the lock > is provided by ovs_mutex, so we could just use an int. > Agreed, as long as packet path does not change the reference count. > > > +struct sw_flow_mask *sw_flow_mask_alloc(void); > > +void sw_flow_mask_add_ref(struct sw_flow_mask *); > > +int sw_flow_mask_del_ref(struct sw_flow_mask *); > > +void sw_flow_mask_insert(struct sw_flow_mask *); > > +void sw_flow_mask_set(struct sw_flow_mask *, struct sw_flow_desc *, u8 > val); > > +struct sw_flow_mask *sw_flow_mask_find(const struct sw_flow_mask *); > > All of these should be prefixed with "ovs_". > Sure. > I'm still working on flow.c - in particular, I want to go through the > usage of key length and encap attributes carefully. I should have the > rest of the comments tomorrow. >
_______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev