On 9 May 2016 at 14:51, Darrell Ball <dlu...@gmail.com> wrote: > I have some initial clarifications first before review > > On Fri, May 6, 2016 at 9:21 AM, Gurucharan Shetty <g...@ovn.org> wrote: > >> Currently we can connect routers via "peer"ing. This limits >> the number of routers that can be connected with each other >> directly to 2. >> > > > This sounds like you are saying that cannot have a topology like > > R1-------R2 > \ / > \ / > \ / > \ / > R3 > > which is common. > Did I read the above comment correctly ? >
The above drawn topology will start getting complex with more gateways. (In case of k8s each machine is a l3 gateway.). I agree that my usage of word "directly" may be confusing. I meant routers connected to each other in the same subnet. > > > >> >> One of the design goals for L3 Gateway is to be able to >> have multiple gateways (each with their own router) >> connected to a distributed router via a switch. >> > > Its not ideal to have a switch required to connect routers - it somewhat > defeats the advantages > of having routers in the first place. > I do not have much experience with real world networking setups. There are probably reasons why your statement is true. From my perspective , all I see is that a switch is used to connect multiple endpoints in the same subnet and If I want multiple routers connected to each other in the same subnet, I need to use a switch. VMware NSX uses something similar in customer deployments, so I believe it is not out of ordinary to do something like this. > This is usually only done if there is existing switching equipment than > must be traversed > But in the case, we are just dealing with software where we have total > flexibility. > From OpenStack perspective, each tenant gets a router and multiple switches with different subnets. The idea is that the OpenStack network plugin, can at best split this single tenant router into 2 with a switch in between on a subnet that neutron does not use. Taking the same logic forward for k8s, I can support multiple gateways. Connecting multiple routers to each other without a switch makes it a pain to remember the interconnections and create multiple subnets (which may simply not be available for a networking backend for internal use). > > >> >> With the above goal in mind, this commit gives the general >> ability to connect multiple routers via a switch. >> >> Signed-off-by: Gurucharan Shetty <g...@ovn.org> >> --- >> This needs the following 2 commits under review to >> first go in. >> 1. ofproto-dpif: Rename "recurse" to "indentation". >> 2. ofproto-dpif: Do not count resubmit to later tables against limit. >> --- >> ovn/controller/lflow.c | 19 ++-- >> ovn/northd/ovn-northd.c | 56 ++++++++++- >> ovn/ovn-nb.xml | 7 -- >> tests/ovn.at | 236 >> +++++++++++++++++++++++++++++++++++++++++++++++ >> 4 files changed, 299 insertions(+), 19 deletions(-) >> >> diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c >> index 96b7c66..efc427d 100644 >> --- a/ovn/controller/lflow.c >> +++ b/ovn/controller/lflow.c >> @@ -215,15 +215,15 @@ add_logical_flows(struct controller_ctx *ctx, const >> struct lport_index *lports, >> if (is_switch(ldp)) { >> /* For a logical switch datapath, local_datapaths tells us >> if there >> * are any local ports for this datapath. If not, we can >> skip >> - * processing logical flows if the flow belongs to egress >> pipeline >> - * or if that logical switch datapath is not patched to any >> logical >> - * router. >> + * processing logical flows if that logical switch datapath >> is not >> + * patched to any logical router. >> * >> - * Otherwise, we still need the ingress pipeline because >> even if >> - * there are no local ports, we still may need to execute >> the ingress >> - * pipeline after a packet leaves a logical router. Further >> - * optimization is possible, but not based on what we know >> with >> - * local_datapaths right now. >> + * Otherwise, we still need both ingress and egress pipeline >> + * because even if there are no local ports, we still may >> need to >> + * execute the ingress pipeline after a packet leaves a >> logical >> + * router and we need to do egress pipeline for a switch that >> + * is connected to only routers. Further optimization is >> possible, >> + * but not based on what we know with local_datapaths right >> now. >> * >> * A better approach would be a kind of "flood fill" >> algorithm: >> * >> @@ -242,9 +242,6 @@ add_logical_flows(struct controller_ctx *ctx, const >> struct lport_index *lports, >> */ >> >> if (!get_local_datapath(local_datapaths, ldp->tunnel_key)) { >> - if (!ingress) { >> - continue; >> - } >> > > This is removing a previous change that was done for some optimization for > avoiding > unnecessary flow creation. > I am not sure about the process here, but maybe this should be tracked as > a separate > patch ? > The above change is needed for this patch to work. The optimization above assumes that a switch always has atleast one real physical endpoint. With this change, since a switch can only be connected to routers, we need to remove the optimization. The optimization itself will need more careful consideration and with more information provided to ovn-controller, it can ideally be improved, but I would ideally not want l3 gateway work delayed because of it. > > > > >> if (!get_patched_datapath(patched_datapaths, >> ldp->tunnel_key)) { >> continue; >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c >> index 9e03606..3a29770 100644 >> --- a/ovn/northd/ovn-northd.c >> +++ b/ovn/northd/ovn-northd.c >> @@ -2110,7 +2110,13 @@ build_lrouter_flows(struct hmap *datapaths, struct >> hmap *ports, >> free(actions); >> free(match); >> } >> - } else if (op->od->n_router_ports) { >> + } else if (op->od->n_router_ports && strcmp(op->nbs->type, >> "router")) { >> + /* This is a logical switch port that backs a VM or a >> container. >> + * Extract its addresses. For each of the address, go >> through all >> + * the router ports attached to the switch (to which this >> port >> + * connects) and if the address in question is reachable >> from the >> + * router port, add an ARP entry in that router's pipeline. >> */ >> + >> for (size_t i = 0; i < op->nbs->n_addresses; i++) { >> struct lport_addresses laddrs; >> if (!extract_lport_addresses(op->nbs->addresses[i], >> &laddrs, >> @@ -2158,8 +2164,56 @@ build_lrouter_flows(struct hmap *datapaths, struct >> hmap *ports, >> >> free(laddrs.ipv4_addrs); >> } >> + } else if (!strcmp(op->nbs->type, "router")) { >> + /* This is a logical switch port that connects to a router. >> */ >> + >> + /* The peer of this switch port is the router port for which >> + * we need to add logical flows such that it can resolve >> + * ARP entries for all the other router ports connected to >> + * the switch in question. */ >> + >> + const char *peer_name = smap_get(&op->nbs->options, >> + "router-port"); >> + if (!peer_name) { >> + continue; >> + } >> + >> + struct ovn_port *peer = ovn_port_find(ports, peer_name); >> + if (!peer || !peer->nbr || !peer->ip) { >> + continue; >> + } >> + >> + for (size_t j = 0; j < op->od->n_router_ports; j++) { >> + const char *router_port_name = smap_get( >> + >> &op->od->router_ports[j]->nbs->options, >> + "router-port"); >> + struct ovn_port *router_port = ovn_port_find(ports, >> + >> router_port_name); >> + if (!router_port || !router_port->nbr || >> !router_port->ip) { >> + continue; >> + } >> + >> + /* Skip the router port under consideration. */ >> + if (router_port == peer) { >> + continue; >> + } >> + >> + if (!router_port->ip) { >> + continue; >> + } >> + char *match = xasprintf("outport == %s && reg0 == >> "IP_FMT, >> + peer->json_key, >> + IP_ARGS(router_port->ip)); >> + char *actions = xasprintf("eth.dst = "ETH_ADDR_FMT"; >> next;", >> + >> ETH_ADDR_ARGS(router_port->mac)); >> + ovn_lflow_add(lflows, peer->od, S_ROUTER_IN_ARP_RESOLVE, >> + 101, match, actions); >> + free(actions); >> + free(match); >> + } >> } >> } >> + >> HMAP_FOR_EACH (od, key_node, datapaths) { >> if (!od->nbr) { >> continue; >> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml >> index c01455d..d7fd595 100644 >> --- a/ovn/ovn-nb.xml >> +++ b/ovn/ovn-nb.xml >> @@ -154,13 +154,6 @@ >> These options apply when <ref column="type"/> is >> <code>router</code>. >> </p> >> >> - <p> >> - If a given logical switch has multiple <code>router</code> >> ports, the >> - <ref table="Logical_Router_Port"/> rows that they reference >> must be >> - all on the same <ref table="Logical_Router"/> (for different >> - subnets). >> - </p> >> - >> <column name="options" key="router-port"> >> Required. The <ref column="name"/> of the <ref >> table="Logical_Router_Port"/> to which this logical switch >> port is >> diff --git a/tests/ovn.at b/tests/ovn.at >> index b9d7ada..6961be0 100644 >> --- a/tests/ovn.at >> +++ b/tests/ovn.at >> @@ -2545,3 +2545,239 @@ OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> >> AT_CLEANUP >> >> +AT_SETUP([ovn -- 2 HVs, 3 LRs connected via LS, static routes]) >> +AT_KEYWORDS([ovnstaticroutes]) >> +AT_SKIP_IF([test $HAVE_PYTHON = no]) >> +ovn_start >> + >> +# Logical network: >> +# Three LRs - R1, R2 and R3 that are connected to each other via LS >> "join" >> +# in 20.0.0.0/24 network. R1 has switchess foo (192.168.1.0/24) >> +# connected to it. R2 has alice (172.16.1.0/24) and R3 has bob ( >> 10.32.1.0/24) >> +# connected to it. >> + >> +ovn-nbctl create Logical_Router name=R1 >> +ovn-nbctl create Logical_Router name=R2 >> +ovn-nbctl create Logical_Router name=R3 >> + >> +ovn-nbctl lswitch-add foo >> +ovn-nbctl lswitch-add alice >> +ovn-nbctl lswitch-add bob >> +ovn-nbctl lswitch-add join >> + >> +# Connect foo to R1 >> +ovn-nbctl -- --id=@lrp create Logical_Router_port name=foo \ >> +network=192.168.1.1/24 mac=\"00:00:01:01:02:03\" -- add Logical_Router >> R1 \ >> +ports @lrp -- lport-add foo rp-foo >> + >> +ovn-nbctl set Logical_port rp-foo type=router options:router-port=foo \ >> +addresses=\"00:00:01:01:02:03\" >> + >> +# Connect alice to R2 >> +ovn-nbctl -- --id=@lrp create Logical_Router_port name=alice \ >> +network=172.16.1.1/24 mac=\"00:00:02:01:02:03\" -- add Logical_Router >> R2 \ >> +ports @lrp -- lport-add alice rp-alice >> + >> +ovn-nbctl set Logical_port rp-alice type=router >> options:router-port=alice \ >> +addresses=\"00:00:02:01:02:03\" >> + >> +# Connect bob to R3 >> +ovn-nbctl -- --id=@lrp create Logical_Router_port name=bob \ >> +network=10.32.1.1/24 mac=\"00:00:03:01:02:03\" -- add Logical_Router R3 >> \ >> +ports @lrp -- lport-add bob rp-bob >> + >> +ovn-nbctl set Logical_port rp-bob type=router options:router-port=bob \ >> +addresses=\"00:00:03:01:02:03\" >> + >> +# Connect R1 to join >> +ovn-nbctl -- --id=@lrp create Logical_Router_port name=R1_join \ >> +network=20.0.0.1/24 mac=\"00:00:04:01:02:03\" -- add Logical_Router R1 \ >> +ports @lrp -- lport-add join r1-join >> + >> +ovn-nbctl set Logical_port r1-join type=router >> options:router-port=R1_join \ >> +addresses='"00:00:04:01:02:03"' >> + >> +# Connect R2 to join >> +ovn-nbctl -- --id=@lrp create Logical_Router_port name=R2_join \ >> +network=20.0.0.2/24 mac=\"00:00:04:01:02:04\" -- add Logical_Router R2 \ >> +ports @lrp -- lport-add join r2-join >> + >> +ovn-nbctl set Logical_port r2-join type=router >> options:router-port=R2_join \ >> +addresses='"00:00:04:01:02:04"' >> + >> + >> +# Connect R3 to join >> +ovn-nbctl -- --id=@lrp create Logical_Router_port name=R3_join \ >> +network=20.0.0.3/24 mac=\"00:00:04:01:02:05\" -- add Logical_Router R3 \ >> +ports @lrp -- lport-add join r3-join >> + >> +ovn-nbctl set Logical_port r3-join type=router >> options:router-port=R3_join \ >> +addresses='"00:00:04:01:02:05"' >> + >> +#install static routes >> +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \ >> +ip_prefix=172.16.1.0/24 nexthop=20.0.0.2 -- add Logical_Router \ >> +R1 static_routes @lrt >> + >> +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \ >> +ip_prefix=10.32.1.0/24 nexthop=20.0.0.3 -- add Logical_Router \ >> +R1 static_routes @lrt >> + >> +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \ >> +ip_prefix=192.168.1.0/24 nexthop=20.0.0.1 -- add Logical_Router \ >> +R2 static_routes @lrt >> + >> +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \ >> +ip_prefix=10.32.1.0/24 nexthop=20.0.0.3 -- add Logical_Router \ >> +R2 static_routes @lrt >> + >> +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \ >> +ip_prefix=192.168.1.0/24 nexthop=20.0.0.1 -- add Logical_Router \ >> +R3 static_routes @lrt >> + >> +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \ >> +ip_prefix=172.16.1.0/24 nexthop=20.0.0.2 -- add Logical_Router \ >> +R3 static_routes @lrt >> + >> +# Create logical port foo1 in foo >> +ovn-nbctl lport-add foo foo1 \ >> +-- lport-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2" >> + >> +# Create logical port alice1 in alice >> +ovn-nbctl lport-add alice alice1 \ >> +-- lport-set-addresses alice1 "f0:00:00:01:02:04 172.16.1.2" >> + >> +# Create logical port bob1 in bob >> +ovn-nbctl lport-add bob bob1 \ >> +-- lport-set-addresses bob1 "f0:00:00:01:02:05 10.32.1.2" >> + >> +# Create two hypervisor and create OVS ports corresponding to logical >> ports. >> +net_add n1 >> + >> +sim_add hv1 >> +as hv1 >> +ovs-vsctl add-br br-phys >> +ovn_attach n1 br-phys 192.168.0.1 >> +ovs-vsctl -- add-port br-int hv1-vif1 -- \ >> + set interface hv1-vif1 external-ids:iface-id=foo1 \ >> + options:tx_pcap=hv1/vif1-tx.pcap \ >> + options:rxq_pcap=hv1/vif1-rx.pcap \ >> + ofport-request=1 >> + >> +ovs-vsctl -- add-port br-int hv1-vif2 -- \ >> + set interface hv1-vif2 external-ids:iface-id=alice1 \ >> + options:tx_pcap=hv1/vif2-tx.pcap \ >> + options:rxq_pcap=hv1/vif2-rx.pcap \ >> + ofport-request=2 >> + >> +sim_add hv2 >> +as hv2 >> +ovs-vsctl add-br br-phys >> +ovn_attach n1 br-phys 192.168.0.2 >> +ovs-vsctl -- add-port br-int hv2-vif1 -- \ >> + set interface hv2-vif1 external-ids:iface-id=bob1 \ >> + options:tx_pcap=hv2/vif1-tx.pcap \ >> + options:rxq_pcap=hv2/vif1-rx.pcap \ >> + ofport-request=1 >> + >> + >> +# Pre-populate the hypervisors' ARP tables so that we don't lose any >> +# packets for ARP resolution (native tunneling doesn't queue packets >> +# for ARP resolution). >> +ovn_populate_arp >> + >> +# Allow some time for ovn-northd and ovn-controller to catch up. >> +# XXX This should be more systematic. >> +sleep 1 >> + >> +ip_to_hex() { >> + printf "%02x%02x%02x%02x" "$@" >> +} >> +trim_zeros() { >> + sed 's/\(00\)\{1,\}$//' >> +} >> + >> +# Send ip packets between foo1 and alice1 >> +src_mac="f00000010203" >> +dst_mac="000001010203" >> +src_ip=`ip_to_hex 192 168 1 2` >> +dst_ip=`ip_to_hex 172 16 1 2` >> >> +packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 >> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet >> +as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet >> + >> +# Send ip packets between foo1 and bob1 >> +src_mac="f00000010203" >> +dst_mac="000001010203" >> +src_ip=`ip_to_hex 192 168 1 2` >> +dst_ip=`ip_to_hex 10 32 1 2` >> >> +packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 >> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet >> + >> +echo "---------NB dump-----" >> +ovn-nbctl show >> +echo "---------------------" >> +ovn-nbctl list logical_router >> +echo "---------------------" >> +ovn-nbctl list logical_router_port >> +echo "---------------------" >> + >> +echo "---------SB dump-----" >> +ovn-sbctl list datapath_binding >> +echo "---------------------" >> +ovn-sbctl list port_binding >> +echo "---------------------" >> +ovn-sbctl dump-flows >> +echo "---------------------" >> + >> +echo "------ hv1 dump ----------" >> +as hv1 ovs-ofctl show br-int >> +as hv1 ovs-ofctl dump-flows br-int >> +echo "------ hv2 dump ----------" >> +as hv2 ovs-ofctl show br-int >> +as hv2 ovs-ofctl dump-flows br-int >> +echo "----------------------------" >> + >> +# Packet to Expect at bob1 >> +src_mac="000003010203" >> +dst_mac="f00000010205" >> +src_ip=`ip_to_hex 192 168 1 2` >> +dst_ip=`ip_to_hex 10 32 1 2` >> >> +expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000 >> + >> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/vif1-tx.pcap | >> trim_zeros > received.packets >> +echo $expected | trim_zeros > expout >> +AT_CHECK([cat received.packets], [0], [expout]) >> + >> +# Packet to Expect at alice1 >> +src_mac="000002010203" >> +dst_mac="f00000010204" >> +src_ip=`ip_to_hex 192 168 1 2` >> +dst_ip=`ip_to_hex 172 16 1 2` >> >> +expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000 >> + >> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/vif2-tx.pcap | >> trim_zeros > received1.packets >> +echo $expected | trim_zeros > expout >> +AT_CHECK([cat received1.packets], [0], [expout]) >> + >> +for sim in hv1 hv2; do >> + as $sim >> + OVS_APP_EXIT_AND_WAIT([ovn-controller]) >> + OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) >> + OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> +done >> + >> +as ovn-sb >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> + >> +as ovn-nb >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> + >> +as northd >> +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >> + >> +as main >> +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) >> +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >> + >> +AT_CLEANUP >> -- >> 1.7.9.5 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev >> > > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev