Alex, thanks for the fast turn around in sending out the patch. A few comments in line.
On Thu, Dec 18, 2014 at 9:46 PM, Alex Wang <al...@nicira.com> wrote: > After commit 0c7812e5e (recirculation: Do not drop packet when > there is no match from internal table.), if flow keys are modified > before the recirculation action (e.g. set vlan ID), the miss > handling of recirc'ed packets may not reach the intended > 'ofproto_dpif' which has rules looking up the 'recirc_id's, > causing drops. > > This commit adds an unittest that captures this bug. Moreover, > to solve this bug, this commit adds mapping between 'recirc_id' > and the corresponding 'ofproto_dpif', and makes sure that the > miss handling of recirc'ed packets are done with the correct > 'ofproto_dpif'. > > Signed-off-by: Alex Wang <al...@nicira.com> > --- > ofproto/ofproto-dpif-xlate.c | 34 +++++++++++++++++++--- > ofproto/ofproto-dpif.c | 65 > ++++++++++++++++++++++++++++++++++++++++-- > ofproto/ofproto-dpif.h | 2 ++ > tests/ofproto-dpif.at | 59 ++++++++++++++++++++++++++++++++++++++ > 4 files changed, 154 insertions(+), 6 deletions(-) > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index c1327a6..84c7a0e 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -1015,7 +1015,17 @@ xlate_lookup_ofproto(const struct dpif_backer *backer, > const struct flow *flow, > * > * '*ofp_in_port' is set to OFPP_NONE if 'flow''s in_port does not exist. > * > - * Returns 0 if successful, ENODEV if the parsed flow has no associated > ofport. > + * When recirc_id is set in 'flow', checks whether the ofproto_dpif that > + * corresponds to the recirc_id is same as the receiving bridge. If they are > + * the same, uses the 'ofproto' and keeps the 'ofp_in_port' as assigned. > + * Otherwise, uses the 'recirc_ofproto' that owns recirc_id and assigns > + * OFPP_NONE to 'ofp_in_port'. Doing this is in that, for the latter case, > + * the rules that match the recirculated packets are installed only to the > + * 'recirc_ofproto', and using OFPP_NONE avoids the case where the in_port is > + * taken or is non-exist. It is probably worth noting that post recirculation flow (with a non zero recirc_id) should not be allowed to cross a patch port. > + * > + * Returns 0 if successful, ENODEV if the parsed flow has no associated > ofport > + * or 'recirc_ofproto' could not be found. > */ > int > xlate_lookup(const struct dpif_backer *backer, const struct flow *flow, > @@ -1036,21 +1046,37 @@ xlate_lookup(const struct dpif_backer *backer, const > struct flow *flow, > return ENODEV; > } > > + if (flow->recirc_id) { > + struct ofproto_dpif *recirc_ofproto; > + > + recirc_ofproto = ofproto_dpif_recirc_get_ofproto(backer, > + flow->recirc_id); > + if (!recirc_ofproto) { > + return ENODEV; > + } > + > + if (recirc_ofproto != ofproto) { > + *ofp_in_port = OFPP_NONE; > + ofproto = recirc_ofproto; > + } > + } Would it make sense to move the logic above into xlate_lookup_ofproto_ ? > + > if (ofprotop) { > *ofprotop = ofproto; > } > > if (ipfix) { > - *ipfix = xport->xbridge->ipfix; > + *ipfix = flow->recirc_id ? NULL : xport->xbridge->ipfix; > } > > if (sflow) { > - *sflow = xport->xbridge->sflow; > + *sflow = flow->recirc_id ? NULL : xport->xbridge->sflow; > } > > if (netflow) { > - *netflow = xport->xbridge->netflow; > + *netflow = flow->recirc_id ? NULL : xport->xbridge->netflow; > } Those are good catches. Sample action is datapath may need to do the same thing. > + > return 0; > } > > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c > index 2165bda..3695814 100644 > --- a/ofproto/ofproto-dpif.c > +++ b/ofproto/ofproto-dpif.c > @@ -251,6 +251,13 @@ COVERAGE_DEFINE(rev_flow_table); > COVERAGE_DEFINE(rev_mac_learning); > COVERAGE_DEFINE(rev_mcast_snooping); > > +/* Stores mapping between 'recirc_id' and 'ofproto-dpif'. */ > +struct dpif_backer_recirc_node { > + struct cmap_node cmap_node; > + struct ofproto_dpif *ofproto; > + uint32_t recirc_id; > +}; > + > /* All datapaths of a given type share a single dpif backer instance. */ > struct dpif_backer { > char *type; > @@ -269,6 +276,8 @@ struct dpif_backer { > > /* Recirculation. */ > struct recirc_id_pool *rid_pool; /* Recirculation ID pool. */ > + struct cmap recirc_map; /* Map of 'recirc_id's to 'ofproto's. */ > + struct ovs_mutex recirc_mutex; /* Protects 'recirc_map'. */ > bool enable_recirc; /* True if the datapath supports recirculation */ > > /* True if the datapath supports variable-length > @@ -835,6 +844,24 @@ dealloc(struct ofproto *ofproto_) > free(ofproto); > } > > +/* Called when 'ofproto' is destructed. Clears all the 'recirc_id's > + * owned by 'ofproto'. */ > +static void > +dpif_backer_recirc_clear_ofproto(struct dpif_backer *backer, > + struct ofproto_dpif *ofproto) > +{ > + struct dpif_backer_recirc_node *node; > + > + ovs_mutex_lock(&backer->recirc_mutex); > + CMAP_FOR_EACH (node, cmap_node, &backer->recirc_map) { > + if (node->ofproto == ofproto) { > + cmap_remove(&backer->recirc_map, &node->cmap_node, > + node->recirc_id); This is really an error condition, right? Before a ofproto is removed, all resources including recirc id should already be freed. On the flip side, if this code actually found any mapping remaining in cmap, the recirc_id is then leaked. > + } > + } > + ovs_mutex_unlock(&backer->recirc_mutex); > +} > + > static void > close_dpif_backer(struct dpif_backer *backer) > { > @@ -851,6 +878,8 @@ close_dpif_backer(struct dpif_backer *backer) > hmap_destroy(&backer->odp_to_ofport_map); > shash_find_and_delete(&all_dpif_backers, backer->type); > recirc_id_pool_destroy(backer->rid_pool); > + cmap_destroy(&backer->recirc_map); > + ovs_mutex_destroy(&backer->recirc_mutex); > free(backer->type); > free(backer->dp_version_string); > dpif_close(backer->dpif); > @@ -964,6 +993,8 @@ open_dpif_backer(const char *type, struct dpif_backer > **backerp) > backer->max_mpls_depth = check_max_mpls_depth(backer); > backer->masked_set_action = check_masked_set_action(backer); > backer->rid_pool = recirc_id_pool_create(); > + ovs_mutex_init(&backer->recirc_mutex); > + cmap_init(&backer->recirc_map); > > backer->enable_tnl_push_pop = dpif_supports_tnl_push_pop(backer->dpif); > atomic_count_init(&backer->tnl_count, 0); > @@ -1379,6 +1410,8 @@ destruct(struct ofproto *ofproto_) > } > guarded_list_destroy(&ofproto->pins); > > + dpif_backer_recirc_clear_ofproto(ofproto->backer, ofproto); > + > mbridge_unref(ofproto->mbridge); > > netflow_unref(ofproto->netflow); > @@ -5332,20 +5365,48 @@ odp_port_to_ofp_port(const struct ofproto_dpif > *ofproto, odp_port_t odp_port) > } > } > > +struct ofproto_dpif * > +ofproto_dpif_recirc_get_ofproto(const struct dpif_backer *backer, > + uint32_t recirc_id) > +{ > + struct dpif_backer_recirc_node *node; > + > + node = CONTAINER_OF(cmap_find(&backer->recirc_map, recirc_id), > + struct dpif_backer_recirc_node, cmap_node); > + > + return node ? node->ofproto : NULL; > +} > + > uint32_t > ofproto_dpif_alloc_recirc_id(struct ofproto_dpif *ofproto) > { > struct dpif_backer *backer = ofproto->backer; > + struct dpif_backer_recirc_node *node = xmalloc(sizeof *node); > + uint32_t recirc_id = recirc_id_alloc(backer->rid_pool); > > - return recirc_id_alloc(backer->rid_pool); recirc_id == 0 indicate failure of allocating the id (i.e. out of id pool). We may not want to add it to cmap in this case. > + ovs_mutex_lock(&backer->recirc_mutex); > + node->recirc_id = recirc_id; > + node->ofproto = ofproto; > + cmap_insert(&backer->recirc_map, &node->cmap_node, node->recirc_id); > + ovs_mutex_unlock(&backer->recirc_mutex); > + > + return recirc_id; > } > > void > ofproto_dpif_free_recirc_id(struct ofproto_dpif *ofproto, uint32_t recirc_id) > { > struct dpif_backer *backer = ofproto->backer; > + struct dpif_backer_recirc_node *node; > > - recirc_id_free(backer->rid_pool, recirc_id); > + node = CONTAINER_OF(cmap_find(&backer->recirc_map, recirc_id), > + struct dpif_backer_recirc_node, cmap_node); > + if (node) { > + ovs_mutex_lock(&backer->recirc_mutex); > + cmap_remove(&backer->recirc_map, &node->cmap_node, node->recirc_id); > + recirc_id_free(backer->rid_pool, node->recirc_id); The locking looks strange. recirc_id_alloc/free uses its own locking. There is really no need to call it within the recirc_mutex lock. The alloc() call above does it outside. May be we can drop id_pool lock and only use recirc_mutex lock. > + ovs_mutex_unlock(&backer->recirc_mutex); > + } > } > > int > diff --git a/ofproto/ofproto-dpif.h b/ofproto/ofproto-dpif.h > index c03e606..994177e 100644 > --- a/ofproto/ofproto-dpif.h > +++ b/ofproto/ofproto-dpif.h > @@ -218,6 +218,8 @@ struct ofport_dpif *odp_port_to_ofport(const struct > dpif_backer *, odp_port_t); > * work the same way as regular flows. > */ > > +struct ofproto_dpif *ofproto_dpif_recirc_get_ofproto(const struct > dpif_backer *ofproto, > + uint32_t recirc_id); > uint32_t ofproto_dpif_alloc_recirc_id(struct ofproto_dpif *ofproto); > void ofproto_dpif_free_recirc_id(struct ofproto_dpif *ofproto, uint32_t > recirc_id); > int ofproto_dpif_add_internal_flow(struct ofproto_dpif *, > diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at > index baa942f..5e4e245 100644 > --- a/tests/ofproto-dpif.at > +++ b/tests/ofproto-dpif.at > @@ -144,6 +144,65 @@ AT_CHECK([ovs-appctl dpif/dump-flows br1 |grep tcp > > br1_flows.txt]) > AT_CHECK([test `grep in_port.4 br1_flows.txt |wc -l` -gt 24]) > AT_CHECK([test `grep in_port.5 br1_flows.txt |wc -l` -gt 24]) > AT_CHECK([test `grep in_port.6 br1_flows.txt |wc -l` -gt 24]) > + > +OVS_VSWITCHD_STOP() > +AT_CLEANUP > + > +# Makes sure recirculation does not change the way packet is handled. > +AT_SETUP([ofproto-dpif, balance-tcp bonding, different recirc flow ]) > +OVS_VSWITCHD_START( > + [add-bond br0 bond0 p1 p2 bond_mode=balance-tcp lacp=active \ > + other-config:lacp-time=fast other-config:bond-rebalance-interval=0 -- > \ > + set interface p1 type=dummy options:pstream=punix:$OVS_RUNDIR/p1.sock > ofport_request=1 -- \ > + set interface p2 type=dummy options:pstream=punix:$OVS_RUNDIR/p2.sock > ofport_request=2 -- \ > + add-br br1 -- \ > + set bridge br1 other-config:hwaddr=aa:66:aa:66:00:00 -- \ > + set bridge br1 datapath-type=dummy other-config:datapath-id=1234 \ > + fail-mode=standalone -- \ > + add-bond br1 bond1 p3 p4 bond_mode=balance-tcp lacp=active \ > + other-config:lacp-time=fast other-config:bond-rebalance-interval=0 -- > \ > + set interface p3 type=dummy options:stream=unix:$OVS_RUNDIR/p1.sock > ofport_request=3 -- \ > + set interface p4 type=dummy options:stream=unix:$OVS_RUNDIR/p2.sock > ofport_request=4 -- \ > + add-port br1 br1- -- set interface br1- type=patch options:peer=br1+ > ofport_request=100 -- \ > + set port br1- tag=1 -- \ > + add-br br-int -- \ > + set bridge br-int other-config:hwaddr=aa:77:aa:77:00:00 -- \ > + set bridge br-int datapath-type=dummy other-config:datapath-id=1235 \ > + fail-mode=secure -- \ > + add-port br-int br1+ -- set interface br1+ type=patch options:peer=br1- > ofport_request=101 -- \ > + add-port br-int p5 -- set interface p5 ofport_request=5 type=dummy > +]) > +AT_CHECK([ovs-appctl netdev-dummy/set-admin-state up], 0, [OK > +]) > + > +# The dl_vlan flow should not be ever matched, > +# since recirculation should not change the flow handling. > +AT_DATA([flows.txt], [dnl > +table=0 priority=1 in_port=5 actions=output(101) According to the commit comment, should there be a push vlan action as well ? > +table=0 priority=2 in_port=5 dl_vlan=1 actions=drop > +]) > +AT_CHECK([ovs-ofctl add-flows br-int flows.txt]) > + > +# Sends a packet to trigger recirculation. > +# Should generate recirc_id(0x12d),dp_hash(0xc1261ba2/0xff). > +AT_CHECK([ovs-appctl netdev-dummy/receive p5 > "in_port(5),eth(src=50:54:00:00:00:05,dst=50:54:00:00:01:00),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1)"]) > + > +# Collects flow stats. > +AT_CHECK([ovs-appctl revalidator/purge], [0]) > + > +# Checks the flow stats in br1, should only be one flow with non-zero > +# 'n_packets' from internal table. > +AT_CHECK([ovs-appctl bridge/dump-flows br1 | ofctl_strip | grep -- > "n_packets" | grep -- "table_id" | sed -e 's/output:[[0-9]]\+/output/'], [0], > [dnl > +table_id=254, n_packets=1, n_bytes=64, > priority=20,recirc_id=0x12d,dp_hash=0xa2/0xff,actions=output > +]) > + > +# Checks the flow stats in br-int, should be only one match. > +AT_CHECK([ovs-ofctl dump-flows br-int | ofctl_strip | sort], [0], [dnl > + n_packets=1, n_bytes=60, priority=1,in_port=5 actions=output:101 > + priority=2,in_port=5,dl_vlan=1 actions=drop > +NXST_FLOW reply: > +]) > + > OVS_VSWITCHD_STOP() > AT_CLEANUP > > -- > 1.7.9.5 > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev