Hey Yamamoto, Sorry for this delayed reply,
Your concern is valid. We will need to discuss it more the coming week, For a correct fix that covers all cases, we are thinking about making userspace restore the flow format of pre-recirc pkt so that the miss handling of the recirc pkts could go through the same pipeline as the pre-recirc pkt (and reach the recirc_ofproto from recv_ofproto)~ The above idea may require more careful design and implementation. Our current priority is fixing the balance-tcp bond case first. And we are thinking it is very rare to use 'patch' port for other types of recirculation. Therefore, the setting of inport to OFPP_NONE is proposed as fix in the short run. Would like to hear your feedback about both the short/long fix~ Thanks, Alex Wang, On Thu, Dec 18, 2014 at 9:43 PM, YAMAMOTO Takashi <yamam...@valinux.co.jp> wrote: > > Hey Yamamoto, > > > > Upon further offline discussion, we'd like to take the following approach > > to fix the bug, > > > > - Keep a map between 'recirc_id' and corresponding 'ofproto_dpif', > > - When handling misses of 'recirc'ed pkts, we find the 'recirc_ofproto' > via > > lookup the map with 'recirc_id', and start the handing from that > > 'recirc_ofproto'. > > - One special case is: > > If the 'recirc_ofproto' is not the same as the 'recv_ofproto' > > ('ofproto_dpif' > > on which the 'recirc'ed pkts are received), we set the 'in_port' as > > OFPP_NONE, since the original 'in_port' is a port in the > 'recv_ofproto'. > > > > At this time, there is only one such special case, which is a packet > > received in bridge A, goes through the patch port, and output to a > > balance-tcp > > bond on bridge B. And for this case, we do not care the in_port of > > 'recirc'ed > > packets. So we just set it to OFPP_NONE. > > > > Do these make sense? > > > > The encoding of OpenFlow in_port in 'recirc_id' could work, but it is not > > necessary. So, we just go for this way and have a map, > > i agree it would work for bond. > however, for non-bond cases, we need to restore the correct in_port. > (the port in recirc_ofproto) > > YAMAMOTO Takashi > > > > > I'm sending of a new RFC patch soon, > > > > Thanks, > > Alex Wang, > > > > On Thu, Dec 18, 2014 at 9:06 PM, YAMAMOTO Takashi < > yamam...@valinux.co.jp> > > wrote: > >> > >> i agree on all points. > >> > >> YAMAMOTO Takashi > >> > >> > Yes, I agree we need to encode (bridge, ofp_port). recirc_id (the > >> > upper 16 bit part) is globally unique, so bridge can be recovered by > >> > maintaining recirc_id ->bridge mapping in the user space. > >> > > >> > I agree odp_port could be used as well. For the problem at hand, using > >> > recirc_id seems to be simpler. > >> > > >> > > >> > On Wed, Dec 17, 2014 at 11:11 PM, YAMAMOTO Takashi > >> > <yamam...@valinux.co.jp> wrote: > >> >> i agree that the suggested patch was too bond specific. > >> >> > >> >> ofp port number is not enough. you need to encode (bridge, ofp_port) > >> >> pair in some way. it's why i suggested to assign odp port numbers to > >> >> every pseudo ports. > >> >> > >> >> YAMAMOTO Takashi > >> >> > >> >>> This is probably obvious: the bond internal flows now should only > >> match 16 bits. > >> >>> > >> >>> On Wed, Dec 17, 2014 at 3:56 PM, Alex Wang <al...@nicira.com> > wrote: > >> >>>> This sounds reasonable, > >> >>>> > >> >>>> I'll just use the 'ofport' in the 'recirc_id' instead of > 'OFPP_LOCAL' > >> >>>> in my implementation. > >> >>>> > >> >>>> Thanks, > >> >>>> Alex Wang, > >> >>>> > >> >>>> On Wed, Dec 17, 2014 at 3:44 PM, Andy Zhou <az...@nicira.com> > wrote: > >> >>>>> > >> >>>>> This solution seems too bond specific. How about the following: > >> >>>>> > >> >>>>> We use the top 16 bits of the recirc_id, as how we use it today. > For > >> >>>>> example, in bond implementation, it represents a bond port. > >> >>>>> > >> >>>>> The lower 16 bits can then be used to represent OF port number > within > >> >>>>> the bridge. This representation is opaque to the kernel, Since we > >> >>>>> only use 16 bits to represent OF port in user space, it can > >> >>>>> be simply 1 to 1 mapping for the user space. > >> >>>>> > >> >>>>> > >> >>>>> On Wed, Dec 17, 2014 at 1:19 PM, Alex Wang <al...@nicira.com> > wrote: > >> >>>>> > After commit 0c7812e5e (recirculation: Do not drop packet when > >> >>>>> > there is no match from internal table.), if flow keys are > modified > >> >>>>> > before the recirculation action (e.g. set vlan ID), the miss > >> >>>>> > handling of recirculated packets may take different path than > the > >> >>>>> > ones. > >> >>>>> > > >> >>>>> > This commit adds an unittest that captures this bug. Moreover, > >> >>>>> > to solve this bug, this commit makes OVS directly input the > >> >>>>> > recirculated packets from local port of the bridge that owns > >> >>>>> > the correpsonding bond. Thus, those packets are always handled > >> >>>>> > from the correct bridge. > >> >>>>> > > >> >>>>> > Signed-off-by: Alex Wang <al...@nicira.com> > >> >>>>> > --- > >> >>>>> > ofproto/bond.c | 26 +++++++++++++++++++ > >> >>>>> > ofproto/bond.h | 2 ++ > >> >>>>> > ofproto/ofproto-dpif-xlate.c | 20 ++++++++++++-- > >> >>>>> > tests/ofproto-dpif.at | 59 > >> >>>>> > ++++++++++++++++++++++++++++++++++++++++++ > >> >>>>> > 4 files changed, 105 insertions(+), 2 deletions(-) > >> >>>>> > > >> >>>>> > diff --git a/ofproto/bond.c b/ofproto/bond.c > >> >>>>> > index c4b3a3a..08900fa 100644 > >> >>>>> > --- a/ofproto/bond.c > >> >>>>> > +++ b/ofproto/bond.c > >> >>>>> > @@ -254,6 +254,32 @@ bond_ref(const struct bond *bond_) > >> >>>>> > return bond; > >> >>>>> > } > >> >>>>> > > >> >>>>> > +struct ofproto_dpif * > >> >>>>> > +bond_get_ofproto(const struct bond *bond) > >> >>>>> > +{ > >> >>>>> > + return bond->ofproto; > >> >>>>> > +} > >> >>>>> > + > >> >>>>> > +/* Searches bond that uses 'recirc_id'. Returns an > 'ref_count'ed > >> >>>>> > + * reference to 'bond'. */ > >> >>>>> > +struct bond * > >> >>>>> > +bond_find_by_recirc_id(const uint32_t recirc_id) > >> >>>>> > +{ > >> >>>>> > + struct bond *bond; > >> >>>>> > + > >> >>>>> > + ovs_rwlock_rdlock(&rwlock); > >> >>>>> > + HMAP_FOR_EACH (bond, hmap_node, all_bonds) { > >> >>>>> > + if (bond->recirc_id == recirc_id) { > >> >>>>> > + bond_ref(bond); > >> >>>>> > + ovs_rwlock_unlock(&rwlock); > >> >>>>> > + return bond; > >> >>>>> > + } > >> >>>>> > + } > >> >>>>> > + ovs_rwlock_unlock(&rwlock); > >> >>>>> > + > >> >>>>> > + return NULL; > >> >>>>> > +} > >> >>>>> > + > >> >>>>> > /* Frees 'bond'. */ > >> >>>>> > void > >> >>>>> > bond_unref(struct bond *bond) > >> >>>>> > diff --git a/ofproto/bond.h b/ofproto/bond.h > >> >>>>> > index c7b6308..43600df 100644 > >> >>>>> > --- a/ofproto/bond.h > >> >>>>> > +++ b/ofproto/bond.h > >> >>>>> > @@ -68,6 +68,8 @@ struct bond *bond_create(const struct > >> bond_settings *, > >> >>>>> > struct ofproto_dpif *ofproto); > >> >>>>> > void bond_unref(struct bond *); > >> >>>>> > struct bond *bond_ref(const struct bond *); > >> >>>>> > +struct bond *bond_find_by_recirc_id(const uint32_t recirc_id); > >> >>>>> > +struct ofproto_dpif *bond_get_ofproto(const struct bond*); > >> >>>>> > > >> >>>>> > bool bond_reconfigure(struct bond *, const struct bond_settings > >> *); > >> >>>>> > void bond_slave_register(struct bond *, void *slave_, > ofp_port_t > >> >>>>> > ofport, struct netdev *); > >> >>>>> > diff --git a/ofproto/ofproto-dpif-xlate.c > >> b/ofproto/ofproto-dpif-xlate.c > >> >>>>> > index c1327a6..dd0f329 100644 > >> >>>>> > --- a/ofproto/ofproto-dpif-xlate.c > >> >>>>> > +++ b/ofproto/ofproto-dpif-xlate.c > >> >>>>> > @@ -981,9 +981,25 @@ static struct ofproto_dpif * > >> >>>>> > xlate_lookup_ofproto_(const struct dpif_backer *backer, const > >> struct > >> >>>>> > flow *flow, > >> >>>>> > ofp_port_t *ofp_in_port, const struct > xport > >> >>>>> > **xportp) > >> >>>>> > { > >> >>>>> > - const struct xport *xport; > >> >>>>> > + const struct xport *xport = NULL; > >> >>>>> > + > >> >>>>> > + /* If 'flow->recirc_id' is set, uses the OFPP_LOCAL port of > >> the > >> >>>>> > + * 'xbridge' that owns the bond. Using 'OFPP_LOCAL' > should be > >> >>>>> > fine, > >> >>>>> > + * since the lookup only matches 'recirc_id' and > 'dp_hash'. */ > >> >>>>> > + if (flow->recirc_id) { > >> >>>>> > + struct bond *bond = > >> bond_find_by_recirc_id(flow->recirc_id); > >> >>>>> > + > >> >>>>> > + if (bond) { > >> >>>>> > + struct xlate_cfg *xcfg = ovsrcu_get(struct > xlate_cfg > >> *, > >> >>>>> > &xcfgp); > >> >>>>> > + struct ofproto_dpif *ofproto = > bond_get_ofproto(bond); > >> >>>>> > > >> >>>>> > - *xportp = xport = xlate_lookup_xport(backer, flow); > >> >>>>> > + bond_unref(bond); > >> >>>>> > + *xportp = xport = get_ofp_port(xbridge_lookup(xcfg, > >> >>>>> > ofproto), > >> >>>>> > + OFPP_LOCAL); > >> >>>>> > + } > >> >>>>> > + } else { > >> >>>>> > + *xportp = xport = xlate_lookup_xport(backer, flow); > >> >>>>> > + } > >> >>>>> > > >> >>>>> > if (xport) { > >> >>>>> > if (ofp_in_port) { > >> >>>>> > diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at > >> >>>>> > index baa942f..5e4e245 100644 > >> >>>>> > --- a/tests/ofproto-dpif.at > >> >>>>> > +++ b/tests/ofproto-dpif.at > >> >>>>> > @@ -144,6 +144,65 @@ AT_CHECK([ovs-appctl dpif/dump-flows br1 > >> |grep tcp > >> >>>>> > > br1_flows.txt]) > >> >>>>> > AT_CHECK([test `grep in_port.4 br1_flows.txt |wc -l` -gt 24]) > >> >>>>> > AT_CHECK([test `grep in_port.5 br1_flows.txt |wc -l` -gt 24]) > >> >>>>> > AT_CHECK([test `grep in_port.6 br1_flows.txt |wc -l` -gt 24]) > >> >>>>> > + > >> >>>>> > +OVS_VSWITCHD_STOP() > >> >>>>> > +AT_CLEANUP > >> >>>>> > + > >> >>>>> > +# Makes sure recirculation does not change the way packet is > >> handled. > >> >>>>> > +AT_SETUP([ofproto-dpif, balance-tcp bonding, different recirc > >> flow ]) > >> >>>>> > +OVS_VSWITCHD_START( > >> >>>>> > + [add-bond br0 bond0 p1 p2 bond_mode=balance-tcp lacp=active \ > >> >>>>> > + other-config:lacp-time=fast > >> >>>>> > other-config:bond-rebalance-interval=0 -- \ > >> >>>>> > + set interface p1 type=dummy > >> >>>>> > options:pstream=punix:$OVS_RUNDIR/p1.sock ofport_request=1 -- \ > >> >>>>> > + set interface p2 type=dummy > >> >>>>> > options:pstream=punix:$OVS_RUNDIR/p2.sock ofport_request=2 -- \ > >> >>>>> > + add-br br1 -- \ > >> >>>>> > + set bridge br1 other-config:hwaddr=aa:66:aa:66:00:00 -- \ > >> >>>>> > + set bridge br1 datapath-type=dummy > >> other-config:datapath-id=1234 \ > >> >>>>> > + fail-mode=standalone -- \ > >> >>>>> > + add-bond br1 bond1 p3 p4 bond_mode=balance-tcp lacp=active \ > >> >>>>> > + other-config:lacp-time=fast > >> >>>>> > other-config:bond-rebalance-interval=0 -- \ > >> >>>>> > + set interface p3 type=dummy > >> options:stream=unix:$OVS_RUNDIR/p1.sock > >> >>>>> > ofport_request=3 -- \ > >> >>>>> > + set interface p4 type=dummy > >> options:stream=unix:$OVS_RUNDIR/p2.sock > >> >>>>> > ofport_request=4 -- \ > >> >>>>> > + add-port br1 br1- -- set interface br1- type=patch > >> options:peer=br1+ > >> >>>>> > ofport_request=100 -- \ > >> >>>>> > + set port br1- tag=1 -- \ > >> >>>>> > + add-br br-int -- \ > >> >>>>> > + set bridge br-int other-config:hwaddr=aa:77:aa:77:00:00 -- \ > >> >>>>> > + set bridge br-int datapath-type=dummy > >> other-config:datapath-id=1235 > >> >>>>> > \ > >> >>>>> > + fail-mode=secure -- \ > >> >>>>> > + add-port br-int br1+ -- set interface br1+ type=patch > >> >>>>> > options:peer=br1- ofport_request=101 -- \ > >> >>>>> > + add-port br-int p5 -- set interface p5 ofport_request=5 > >> type=dummy > >> >>>>> > +]) > >> >>>>> > +AT_CHECK([ovs-appctl netdev-dummy/set-admin-state up], 0, [OK > >> >>>>> > +]) > >> >>>>> > + > >> >>>>> > +# The dl_vlan flow should not be ever matched, > >> >>>>> > +# since recirculation should not change the flow handling. > >> >>>>> > +AT_DATA([flows.txt], [dnl > >> >>>>> > +table=0 priority=1 in_port=5 actions=output(101) > >> >>>>> > +table=0 priority=2 in_port=5 dl_vlan=1 actions=drop > >> >>>>> > +]) > >> >>>>> > +AT_CHECK([ovs-ofctl add-flows br-int flows.txt]) > >> >>>>> > + > >> >>>>> > +# Sends a packet to trigger recirculation. > >> >>>>> > +# Should generate recirc_id(0x12d),dp_hash(0xc1261ba2/0xff). > >> >>>>> > +AT_CHECK([ovs-appctl netdev-dummy/receive p5 > >> >>>>> > > >> > "in_port(5),eth(src=50:54:00:00:00:05,dst=50:54:00:00:01:00),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1)"]) > >> >>>>> > + > >> >>>>> > +# Collects flow stats. > >> >>>>> > +AT_CHECK([ovs-appctl revalidator/purge], [0]) > >> >>>>> > + > >> >>>>> > +# Checks the flow stats in br1, should only be one flow with > >> non-zero > >> >>>>> > +# 'n_packets' from internal table. > >> >>>>> > +AT_CHECK([ovs-appctl bridge/dump-flows br1 | ofctl_strip | > grep -- > >> >>>>> > "n_packets" | grep -- "table_id" | sed -e > >> 's/output:[[0-9]]\+/output/'], > >> >>>>> > [0], [dnl > >> >>>>> > +table_id=254, n_packets=1, n_bytes=64, > >> >>>>> > priority=20,recirc_id=0x12d,dp_hash=0xa2/0xff,actions=output > >> >>>>> > +]) > >> >>>>> > + > >> >>>>> > +# Checks the flow stats in br-int, should be only one match. > >> >>>>> > +AT_CHECK([ovs-ofctl dump-flows br-int | ofctl_strip | sort], > [0], > >> [dnl > >> >>>>> > + n_packets=1, n_bytes=60, priority=1,in_port=5 > actions=output:101 > >> >>>>> > + priority=2,in_port=5,dl_vlan=1 actions=drop > >> >>>>> > +NXST_FLOW reply: > >> >>>>> > +]) > >> >>>>> > + > >> >>>>> > OVS_VSWITCHD_STOP() > >> >>>>> > AT_CLEANUP > >> >>>>> > > >> >>>>> > -- > >> >>>>> > 1.7.9.5 > >> >>>>> > > >> >>> _______________________________________________ > >> >>> dev mailing list > >> >>> dev@openvswitch.org > >> >>> http://openvswitch.org/mailman/listinfo/dev > >> > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev