Correcting typo below Also, supporting transit LS should "NOT" prevent other optimizations.
On Tue, May 24, 2016 at 3:17 PM, Darrell Ball <dlu...@gmail.com> wrote: > > > On Tue, May 24, 2016 at 7:41 AM, Guru Shetty <g...@ovn.org> wrote: > >> >> >> On 21 May 2016 at 11:48, Darrell Ball <dlu...@gmail.com> wrote: >> >>> I made some modifications to code in Patches 1 and 2 to remove the >>> Transit >>> LS >>> requirements. >>> >> >> These are the reasons for the need of a LS to be able to be connected to >> multiple routers. >> 1. I think it should be left to the upstream user on how they want to >> connect their DRs with GRs. On a 1000 node k8s cluster using peering would >> mean that I need to add 1000 DR router ports and manage 1000 subnets. If I >> use a LS in-between I need to add only one router port for the DR and >> manage only one subnet. >> > > I realize part of the topology is not controllable for k8s and this case > specifically. > > This is a case where 1000 HVs each have their own "GR router" connected to > a DR for east-west support. > There is a tradeoff b/w distributing 1000 DR ports and extra static route > associated flows for 1 DR datapath to each HV > vs > 1 DR port, 1000 Transit LS ports total and a transit LS datapath required > on 1000 HVs and distributing the > Transit LS datapath flows to all 1000 HVs, as well as each Transit LS peer > port requiring an extra arp flow. Thats 1000 > extra arp flows for each HV. > > For subnet management, I don't see much issues either way. /31 subnet > management is trivial and easy to automate. > > For this specific K8s case, its not clear whether using a Transit LS is > worse overall factoring in both > data packet pipeline, number of flows, number of datapaths and extra > complexity. > In most cases, avoiding a Transit LS would be better. > > Corrected typo here: > Also, supporting transit LS should NOT prevent other optimizations. > > > >> 2. The ability to connect multiple routers to a switch is needed on the >> north side of the GR as we will need to connect multiple GRs to a switch to >> be able to access the physical network for ARP resolution. This is for both >> north-south as well as east-west. >> > > This is not a transit LS OVN case. > > > >> 3. A Transit LS is needed for the final patch in this series to work (i.e >> actual DNAT and SNAT). The final patch needs the packet to enter the >> ingress pipeline of a router. The current implementation cannot handle >> peering as packets enter the egress pipeline of the router. To support >> peering, it will need further enhancements. >> > > The dependency of the final patch on Transit LS usage/topology is > something that I wanted to make clear with > this exchange, especially for folks not part of the discussion last week. > > > >> >> >> >>> >>> In summary: >>> >>> I removed all changes to lflow.c thereby reinstating the previous >>> optimization. >>> >>> I made some modifications to ovn-northd.c changes to remove the Transit >>> LS >>> special >>> aspects and additional arp flows. >>> >>> I left the other code changes in Patches 1 and 2 as they were. >>> >>> The overall resulting diff to support both patches 1 and 2 is reduced in >>> ovn-northd.c >>> and becomes: >>> >>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c >>> index b271f7f..2e2236b 100644 >>> --- a/ovn/northd/ovn-northd.c >>> +++ b/ovn/northd/ovn-northd.c >>> @@ -702,11 +702,25 @@ ovn_port_update_sbrec(const struct ovn_port *op) >>> { >>> sbrec_port_binding_set_datapath(op->sb, op->od->sb); >>> if (op->nbr) { >>> - sbrec_port_binding_set_type(op->sb, "patch"); >>> + /* If the router is for l3 gateway, it resides on a chassis >>> + * and its port type is "gateway". */ >>> + const char *chassis = smap_get(&op->od->nbr->options, >>> "chassis"); >>> + >>> + if (chassis && op->peer && op->peer->od && op->peer->od->nbs){ >>> + sbrec_port_binding_set_type(op->sb, "gateway"); >>> + } else { >>> + sbrec_port_binding_set_type(op->sb, "patch"); >>> + } >>> >>> const char *peer = op->peer ? op->peer->key : "<error>"; >>> - const struct smap ids = SMAP_CONST1(&ids, "peer", peer); >>> - sbrec_port_binding_set_options(op->sb, &ids); >>> + struct smap new; >>> + smap_init(&new); >>> + smap_add(&new, "peer", peer); >>> + if (chassis) { >>> + smap_add(&new, "gateway-chassis", chassis); >>> + } >>> + sbrec_port_binding_set_options(op->sb, &new); >>> + smap_destroy(&new); >>> >>> sbrec_port_binding_set_parent_port(op->sb, NULL); >>> sbrec_port_binding_set_tag(op->sb, NULL, 0); >>> @@ -716,15 +730,31 @@ ovn_port_update_sbrec(const struct ovn_port *op) >>> sbrec_port_binding_set_type(op->sb, op->nbs->type); >>> sbrec_port_binding_set_options(op->sb, &op->nbs->options); >>> } else { >>> - sbrec_port_binding_set_type(op->sb, "patch"); >>> + const char *chassis = NULL; >>> + if (op->peer && op->peer->od && op->peer->od->nbr) { >>> + chassis = smap_get(&op->peer->od->nbr->options, >>> "chassis"); >>> + } >>> + /* A switch port connected to a gateway router is also of >>> + * type "gateway". */ >>> + if (chassis) { >>> + sbrec_port_binding_set_type(op->sb, "gateway"); >>> + } else { >>> + sbrec_port_binding_set_type(op->sb, "patch"); >>> + } >>> >>> const char *router_port = smap_get(&op->nbs->options, >>> "router-port"); >>> if (!router_port) { >>> router_port = "<error>"; >>> } >>> - const struct smap ids = SMAP_CONST1(&ids, "peer", >>> router_port); >>> - sbrec_port_binding_set_options(op->sb, &ids); >>> + struct smap new; >>> + smap_init(&new); >>> + smap_add(&new, "peer", router_port); >>> + if (chassis) { >>> + smap_add(&new, "gateway-chassis", chassis); >>> + } >>> + sbrec_port_binding_set_options(op->sb, &new); >>> + smap_destroy(&new); >>> } >>> sbrec_port_binding_set_parent_port(op->sb, >>> op->nbs->parent_name); >>> sbrec_port_binding_set_tag(op->sb, op->nbs->tag, >>> op->nbs->n_tag); >>> >>> >>> I added a new test to demonstrate direct DR<->GR connectivity. >>> >>> >>> AT_SETUP([ovn -- 2 HVs, 3 LRs, 1 DR directly connected to 2 gateway >>> routers >>> ]) >>> AT_KEYWORDS([ovndirectlyconnectedrouters]) >>> AT_SKIP_IF([test $HAVE_PYTHON = no]) >>> ovn_start >>> >>> # Logical network: >>> # Three LRs - R1, R2 and R3 that are connected to each other directly >>> # in 20.0.0.2/31 and 21.0.0.2/31 networks. R1 has switch foo ( >>> 192.168.1.0/24 >>> ) >>> # connected to it. R2 has alice (172.16.1.0/24) and R3 has bob ( >>> 10.32.1.0/24 >>> ) >>> # connected to it. >>> >>> ovn-nbctl create Logical_Router name=R1 >>> ovn-nbctl create Logical_Router name=R2 options:chassis="hv2" >>> ovn-nbctl create Logical_Router name=R3 options:chassis="hv2" >>> >>> ovn-nbctl lswitch-add foo >>> ovn-nbctl lswitch-add alice >>> ovn-nbctl lswitch-add bob >>> >>> # Connect foo to R1 >>> ovn-nbctl -- --id=@lrp create Logical_Router_port name=foo \ >>> network=192.168.1.1/24 mac=\"00:00:01:01:02:03\" -- add Logical_Router >>> R1 \ >>> ports @lrp -- lport-add foo rp-foo >>> >>> ovn-nbctl set Logical_port rp-foo type=router options:router-port=foo \ >>> addresses=\"00:00:01:01:02:03\" >>> >>> # Connect alice to R2 >>> ovn-nbctl -- --id=@lrp create Logical_Router_port name=alice \ >>> network=172.16.1.1/24 mac=\"00:00:02:01:02:03\" -- add Logical_Router >>> R2 \ >>> ports @lrp -- lport-add alice rp-alice >>> >>> ovn-nbctl set Logical_port rp-alice type=router >>> options:router-port=alice \ >>> addresses=\"00:00:02:01:02:03\" >>> >>> # Connect bob to R3 >>> ovn-nbctl -- --id=@lrp create Logical_Router_port name=bob \ >>> network=10.32.1.1/24 mac=\"00:00:03:01:02:03\" -- add Logical_Router R3 >>> \ >>> ports @lrp -- lport-add bob rp-bob >>> >>> ovn-nbctl set Logical_port rp-bob type=router options:router-port=bob \ >>> addresses=\"00:00:03:01:02:03\" >>> >>> # Interconnect R1 and R2 >>> lrp1_uuid_2_R2=`ovn-nbctl -- --id=@lrp create Logical_Router_port >>> name=R1_R2 \ >>> network="20.0.0.2/31" mac=\"00:00:00:02:03:04\" \ >>> -- add Logical_Router R1 ports @lrp` >>> >>> lrp2_uuid_2_R1=`ovn-nbctl -- --id=@lrp create Logical_Router_port >>> name=R2_R1 \ >>> network="20.0.0.3/31" mac=\"00:00:00:02:03:05\" \ >>> -- add Logical_Router R2 ports @lrp` >>> >>> ovn-nbctl set logical_router_port $lrp1_uuid_2_R2 peer="R2_R1" >>> ovn-nbctl set logical_router_port $lrp2_uuid_2_R1 peer="R1_R2" >>> >>> # Interconnect R1 and R3 >>> lrp1_uuid_2_R3=`ovn-nbctl -- --id=@lrp create Logical_Router_port >>> name=R1_R3 \ >>> network="21.0.0.2/31" mac=\"00:00:21:02:03:04\" \ >>> -- add Logical_Router R1 ports @lrp` >>> >>> lrp3_uuid_2_R1=`ovn-nbctl -- --id=@lrp create Logical_Router_port >>> name=R3_R1 \ >>> network="21.0.0.3/31" mac=\"00:00:21:02:03:05\" \ >>> -- add Logical_Router R3 ports @lrp` >>> >>> ovn-nbctl set logical_router_port $lrp1_uuid_2_R3 peer="R3_R1" >>> ovn-nbctl set logical_router_port $lrp3_uuid_2_R1 peer="R1_R3" >>> >>> #install static route in R1 to get to alice >>> ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \ >>> ip_prefix=172.16.1.0/24 nexthop=20.0.0.3 -- add Logical_Router \ >>> R1 static_routes @lrt >>> >>> #install static route in R1 to get to bob >>> ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \ >>> ip_prefix=10.32.1.0/24 nexthop=21.0.0.3 -- add Logical_Router \ >>> R1 static_routes @lrt >>> >>> #install static route in R2 to get to foo >>> ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \ >>> ip_prefix=192.168.1.0/24 nexthop=20.0.0.2 -- add Logical_Router \ >>> R2 static_routes @lrt >>> >>> # Create terminal logical ports >>> # Create logical port foo1 in foo >>> ovn-nbctl lport-add foo foo1 \ >>> -- lport-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2" >>> >>> # Create logical port alice1 in alice >>> ovn-nbctl lport-add alice alice1 \ >>> -- lport-set-addresses alice1 "f0:00:00:01:02:04 172.16.1.2" >>> >>> # Create logical port bob1 in bob >>> ovn-nbctl lport-add bob bob1 \ >>> -- lport-set-addresses bob1 "f0:00:00:01:02:05 10.32.1.2" >>> >>> # Create two hypervisor and create OVS ports corresponding to logical >>> ports. >>> net_add n1 >>> >>> sim_add hv1 >>> as hv1 >>> ovs-vsctl add-br br-phys >>> ovn_attach n1 br-phys 192.168.0.1 >>> ovs-vsctl -- add-port br-int hv1-vif1 -- \ >>> set interface hv1-vif1 external-ids:iface-id=foo1 \ >>> options:tx_pcap=hv1/vif1-tx.pcap \ >>> options:rxq_pcap=hv1/vif1-rx.pcap \ >>> ofport-request=1 >>> >>> sim_add hv2 >>> as hv2 >>> ovs-vsctl add-br br-phys >>> ovn_attach n1 br-phys 192.168.0.2 >>> ovs-vsctl -- add-port br-int hv2-vif1 -- \ >>> set interface hv2-vif1 external-ids:iface-id=bob1 \ >>> options:tx_pcap=hv2/vif1-tx.pcap \ >>> options:rxq_pcap=hv2/vif1-rx.pcap \ >>> ofport-request=1 >>> >>> ovs-vsctl -- add-port br-int hv2-vif2 -- \ >>> set interface hv2-vif2 external-ids:iface-id=alice1 \ >>> options:tx_pcap=hv2/vif2-tx.pcap \ >>> options:rxq_pcap=hv2/vif2-rx.pcap \ >>> ofport-request=2 >>> >>> # Pre-populate the hypervisors' ARP tables so that we don't lose any >>> # packets for ARP resolution (native tunneling doesn't queue packets >>> # for ARP resolution). >>> ovn_populate_arp >>> >>> # Allow some time for ovn-northd and ovn-controller to catch up. >>> # XXX This should be more systematic. >>> sleep 1 >>> >>> ip_to_hex() { >>> printf "%02x%02x%02x%02x" "$@" >>> } >>> trim_zeros() { >>> sed 's/\(00\)\{1,\}$//' >>> } >>> >>> # Send ip packets between foo1 and alice1 >>> src_mac="f00000010203" >>> dst_mac="000001010203" >>> src_ip=`ip_to_hex 192 168 1 2` >>> dst_ip=`ip_to_hex 172 16 1 2` >>> >>> packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 >>> as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet >>> as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet >>> >>> # Send ip packets between foo1 and bob1 >>> src_mac="f00000010203" >>> dst_mac="000001010203" >>> src_ip=`ip_to_hex 192 168 1 2` >>> dst_ip=`ip_to_hex 10 32 1 2` >>> >>> packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 >>> as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet >>> >>> # Send ip packets from alice1 to foo1 >>> src_mac="f00000010204" >>> dst_mac="000002010203" >>> src_ip=`ip_to_hex 172 16 1 2` >>> dst_ip=`ip_to_hex 192 168 1 2` >>> >>> packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 >>> as hv2 ovs-appctl netdev-dummy/receive hv2-vif2 $packet >>> >>> echo "---------NB dump-----" >>> ovn-nbctl show >>> echo "---------------------" >>> ovn-nbctl list logical_router >>> echo "---------------------" >>> ovn-nbctl list logical_router_port >>> echo "---------------------" >>> >>> echo "---------SB dump-----" >>> ovn-sbctl list datapath_binding >>> echo "---------------------" >>> ovn-sbctl list port_binding >>> echo "---------------------" >>> #ovn-sbctl dump-flows >>> echo "---------------------" >>> >>> echo "------ hv1 dump ----------" >>> as hv1 ovs-vsctl show >>> as hv1 ovs-ofctl show br-int >>> as hv1 ovs-ofctl dump-flows br-int >>> echo "------ hv2 dump ----------" >>> as hv2 ovs-vsctl show >>> as hv2 ovs-ofctl show br-int >>> as hv2 ovs-ofctl dump-flows br-int >>> echo "----------------------------" >>> >>> # Packet to Expect at alice1 >>> src_mac="000002010203" >>> dst_mac="f00000010204" >>> src_ip=`ip_to_hex 192 168 1 2` >>> dst_ip=`ip_to_hex 172 16 1 2` >>> >>> expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000 >>> >>> $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/vif2-tx.pcap | >>> trim_zeros > >>> received.packets >>> echo $expected | trim_zeros > expout >>> AT_CHECK([cat received.packets], [0], [expout]) >>> >>> # Packet to Expect at bob1 >>> src_mac="000003010203" >>> dst_mac="f00000010205" >>> src_ip=`ip_to_hex 192 168 1 2` >>> dst_ip=`ip_to_hex 10 32 1 2` >>> >>> expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000 >>> >>> $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/vif1-tx.pcap | >>> trim_zeros > >>> received1.packets >>> echo $expected | trim_zeros > expout >>> AT_CHECK([cat received1.packets], [0], [expout]) >>> >>> # Packet to Expect at foo1 >>> src_mac="000001010203" >>> dst_mac="f00000010203" >>> src_ip=`ip_to_hex 172 16 1 2` >>> dst_ip=`ip_to_hex 192 168 1 2` >>> >>> expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000 >>> >>> $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/vif1-tx.pcap | >>> trim_zeros > >>> received2.packets >>> echo $expected | trim_zeros > expout >>> AT_CHECK([cat received2.packets], [0], [expout]) >>> >>> for sim in hv1 hv2; do >>> as $sim >>> OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>> OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) >>> OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> done >>> >>> as ovn-sb >>> OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> >>> as ovn-nb >>> OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> >>> as northd >>> OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>> >>> as main >>> OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) >>> OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> >>> AT_CLEANUP >>> >>> On Thu, May 19, 2016 at 1:02 PM, Gurucharan Shetty <g...@ovn.org> wrote: >>> >>> > Currently OVN has distributed switches and routers. When a packet >>> > exits a container or a VM, the entire lifecycle of the packet >>> > through multiple switches and routers are calculated in source >>> > chassis itself. When the destination endpoint resides on a different >>> > chassis, the packet is sent to the other chassis and it only goes >>> > through the egress pipeline of that chassis once and eventually to >>> > the real destination. >>> > >>> > When the packet returns back, the same thing happens. The return >>> > packet leaves the VM/container on the chassis where it resides. >>> > The packet goes through all the switches and routers in the logical >>> > pipleline on that chassis and then sent to the eventual destination >>> > over the tunnel. >>> > >>> > The above makes the logical pipeline very flexible and easy. But, >>> > creates a problem for cases where you need to add stateful services >>> > (via conntrack) on switches and routers. >>> > >>> > For l3 gateways, we plan to leverage DNAT and SNAT functionality >>> > and we want to apply DNAT and SNAT rules on a router. So we ideally >>> need >>> > the packet to go through that router in both directions in the same >>> > chassis. To achieve this, this commit introduces a new gateway router >>> > which is >>> > static and can be connected to your distributed router via a switch. >>> > >>> > To make minimal changes in OVN's logical pipeline, this commit >>> > tries to make the switch port connected to a l3 gateway router look >>> like >>> > a container/VM endpoint for every other chassis except the chassis >>> > on which the l3 gateway router resides. On the chassis where the >>> > gateway router resides, the connection looks just like a patch port. >>> > >>> > This is achieved by the doing the following: >>> > Introduces a new type of port_binding record called 'gateway'. >>> > On the chassis where the gateway router resides, this port behaves just >>> > like the port of type 'patch'. The ovn-controller on that chassis >>> > populates the "chassis" column for this record as an indication for >>> > other ovn-controllers of its physical location. Other ovn-controllers >>> > treat this port as they would treat a VM/Container port on a different >>> > chassis. >>> > >>> > Signed-off-by: Gurucharan Shetty <g...@ovn.org> >>> > --- >>> > ovn/controller/binding.c | 3 +- >>> > ovn/controller/ovn-controller.c | 5 +- >>> > ovn/controller/patch.c | 29 ++++++- >>> > ovn/controller/patch.h | 3 +- >>> > ovn/northd/ovn-northd.c | 42 +++++++-- >>> > ovn/ovn-nb.ovsschema | 9 +- >>> > ovn/ovn-nb.xml | 15 ++++ >>> > ovn/ovn-sb.xml | 35 +++++++- >>> > tests/ovn.at | 184 >>> > ++++++++++++++++++++++++++++++++++++++++ >>> > 9 files changed, 309 insertions(+), 16 deletions(-) >>> > >>> > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c >>> > index a0d8b96..e5e55b1 100644 >>> > --- a/ovn/controller/binding.c >>> > +++ b/ovn/controller/binding.c >>> > @@ -200,7 +200,8 @@ binding_run(struct controller_ctx *ctx, const >>> struct >>> > ovsrec_bridge *br_int, >>> > } >>> > sbrec_port_binding_set_chassis(binding_rec, >>> chassis_rec); >>> > } >>> > - } else if (chassis_rec && binding_rec->chassis == >>> chassis_rec) { >>> > + } else if (chassis_rec && binding_rec->chassis == chassis_rec >>> > + && strcmp(binding_rec->type, "gateway")) { >>> > if (ctx->ovnsb_idl_txn) { >>> > VLOG_INFO("Releasing lport %s from this chassis.", >>> > binding_rec->logical_port); >>> > diff --git a/ovn/controller/ovn-controller.c >>> > b/ovn/controller/ovn-controller.c >>> > index 511b184..bc4c24f 100644 >>> > --- a/ovn/controller/ovn-controller.c >>> > +++ b/ovn/controller/ovn-controller.c >>> > @@ -364,8 +364,9 @@ main(int argc, char *argv[]) >>> > &local_datapaths); >>> > } >>> > >>> > - if (br_int) { >>> > - patch_run(&ctx, br_int, &local_datapaths, >>> &patched_datapaths); >>> > + if (br_int && chassis_id) { >>> > + patch_run(&ctx, br_int, chassis_id, &local_datapaths, >>> > + &patched_datapaths); >>> > >>> > struct lport_index lports; >>> > struct mcgroup_index mcgroups; >>> > diff --git a/ovn/controller/patch.c b/ovn/controller/patch.c >>> > index 4808146..e8abe30 100644 >>> > --- a/ovn/controller/patch.c >>> > +++ b/ovn/controller/patch.c >>> > @@ -267,12 +267,28 @@ add_patched_datapath(struct hmap >>> *patched_datapaths, >>> > static void >>> > add_logical_patch_ports(struct controller_ctx *ctx, >>> > const struct ovsrec_bridge *br_int, >>> > + const char *local_chassis_id, >>> > struct shash *existing_ports, >>> > struct hmap *patched_datapaths) >>> > { >>> > + const struct sbrec_chassis *chassis_rec; >>> > + chassis_rec = get_chassis(ctx->ovnsb_idl, local_chassis_id); >>> > + if (!chassis_rec) { >>> > + return; >>> > + } >>> > + >>> > const struct sbrec_port_binding *binding; >>> > SBREC_PORT_BINDING_FOR_EACH (binding, ctx->ovnsb_idl) { >>> > - if (!strcmp(binding->type, "patch")) { >>> > + bool local_port = false; >>> > + if (!strcmp(binding->type, "gateway")) { >>> > + const char *chassis = smap_get(&binding->options, >>> > + "gateway-chassis"); >>> > + if (!strcmp(local_chassis_id, chassis)) { >>> > + local_port = true; >>> > + } >>> > + } >>> > + >>> > + if (!strcmp(binding->type, "patch") || local_port) { >>> > const char *local = binding->logical_port; >>> > const char *peer = smap_get(&binding->options, "peer"); >>> > if (!peer) { >>> > @@ -287,13 +303,19 @@ add_logical_patch_ports(struct controller_ctx >>> *ctx, >>> > free(dst_name); >>> > free(src_name); >>> > add_patched_datapath(patched_datapaths, binding); >>> > + if (local_port) { >>> > + if (binding->chassis != chassis_rec && >>> > ctx->ovnsb_idl_txn) { >>> > + sbrec_port_binding_set_chassis(binding, >>> chassis_rec); >>> > + } >>> > + } >>> > } >>> > } >>> > } >>> > >>> > void >>> > patch_run(struct controller_ctx *ctx, const struct ovsrec_bridge >>> *br_int, >>> > - struct hmap *local_datapaths, struct hmap >>> *patched_datapaths) >>> > + const char *chassis_id, struct hmap *local_datapaths, >>> > + struct hmap *patched_datapaths) >>> > { >>> > if (!ctx->ovs_idl_txn) { >>> > return; >>> > @@ -313,7 +335,8 @@ patch_run(struct controller_ctx *ctx, const struct >>> > ovsrec_bridge *br_int, >>> > * 'existing_ports' any patch ports that do exist in the database >>> and >>> > * should be there. */ >>> > add_bridge_mappings(ctx, br_int, &existing_ports, >>> local_datapaths); >>> > - add_logical_patch_ports(ctx, br_int, &existing_ports, >>> > patched_datapaths); >>> > + add_logical_patch_ports(ctx, br_int, chassis_id, &existing_ports, >>> > + patched_datapaths); >>> > >>> > /* Now 'existing_ports' only still contains patch ports that >>> exist in >>> > the >>> > * database but shouldn't. Delete them from the database. */ >>> > diff --git a/ovn/controller/patch.h b/ovn/controller/patch.h >>> > index d5d842e..7920a48 100644 >>> > --- a/ovn/controller/patch.h >>> > +++ b/ovn/controller/patch.h >>> > @@ -27,6 +27,7 @@ struct hmap; >>> > struct ovsrec_bridge; >>> > >>> > void patch_run(struct controller_ctx *, const struct ovsrec_bridge >>> > *br_int, >>> > - struct hmap *local_datapaths, struct hmap >>> > *patched_datapaths); >>> > + const char *chassis_id, struct hmap *local_datapaths, >>> > + struct hmap *patched_datapaths); >>> > >>> > #endif /* ovn/patch.h */ >>> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c >>> > index f469e89..7852d83 100644 >>> > --- a/ovn/northd/ovn-northd.c >>> > +++ b/ovn/northd/ovn-northd.c >>> > @@ -690,11 +690,24 @@ ovn_port_update_sbrec(const struct ovn_port *op) >>> > { >>> > sbrec_port_binding_set_datapath(op->sb, op->od->sb); >>> > if (op->nbr) { >>> > - sbrec_port_binding_set_type(op->sb, "patch"); >>> > + /* If the router is for l3 gateway, it resides on a chassis >>> > + * and its port type is "gateway". */ >>> > + const char *chassis = smap_get(&op->od->nbr->options, >>> "chassis"); >>> > + if (chassis) { >>> > + sbrec_port_binding_set_type(op->sb, "gateway"); >>> > + } else { >>> > + sbrec_port_binding_set_type(op->sb, "patch"); >>> > + } >>> > >>> > const char *peer = op->peer ? op->peer->key : "<error>"; >>> > - const struct smap ids = SMAP_CONST1(&ids, "peer", peer); >>> > - sbrec_port_binding_set_options(op->sb, &ids); >>> > + struct smap new; >>> > + smap_init(&new); >>> > + smap_add(&new, "peer", peer); >>> > + if (chassis) { >>> > + smap_add(&new, "gateway-chassis", chassis); >>> > + } >>> > + sbrec_port_binding_set_options(op->sb, &new); >>> > + smap_destroy(&new); >>> > >>> > sbrec_port_binding_set_parent_port(op->sb, NULL); >>> > sbrec_port_binding_set_tag(op->sb, NULL, 0); >>> > @@ -704,15 +717,32 @@ ovn_port_update_sbrec(const struct ovn_port *op) >>> > sbrec_port_binding_set_type(op->sb, op->nbs->type); >>> > sbrec_port_binding_set_options(op->sb, &op->nbs->options); >>> > } else { >>> > - sbrec_port_binding_set_type(op->sb, "patch"); >>> > + const char *chassis = NULL; >>> > + if (op->peer && op->peer->od && op->peer->od->nbr) { >>> > + chassis = smap_get(&op->peer->od->nbr->options, >>> > "chassis"); >>> > + } >>> > + >>> > + /* A switch port connected to a gateway router is also of >>> > + * type "gateway". */ >>> > + if (chassis) { >>> > + sbrec_port_binding_set_type(op->sb, "gateway"); >>> > + } else { >>> > + sbrec_port_binding_set_type(op->sb, "patch"); >>> > + } >>> > >>> > const char *router_port = smap_get(&op->nbs->options, >>> > "router-port"); >>> > if (!router_port) { >>> > router_port = "<error>"; >>> > } >>> > - const struct smap ids = SMAP_CONST1(&ids, "peer", >>> > router_port); >>> > - sbrec_port_binding_set_options(op->sb, &ids); >>> > + struct smap new; >>> > + smap_init(&new); >>> > + smap_add(&new, "peer", router_port); >>> > + if (chassis) { >>> > + smap_add(&new, "gateway-chassis", chassis); >>> > + } >>> > + sbrec_port_binding_set_options(op->sb, &new); >>> > + smap_destroy(&new); >>> > } >>> > sbrec_port_binding_set_parent_port(op->sb, >>> op->nbs->parent_name); >>> > sbrec_port_binding_set_tag(op->sb, op->nbs->tag, >>> op->nbs->n_tag); >>> > diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema >>> > index 8163f6a..fa21b30 100644 >>> > --- a/ovn/ovn-nb.ovsschema >>> > +++ b/ovn/ovn-nb.ovsschema >>> > @@ -1,7 +1,7 @@ >>> > { >>> > "name": "OVN_Northbound", >>> > - "version": "2.1.1", >>> > - "cksum": "2615511875 5108", >>> > + "version": "2.1.2", >>> > + "cksum": "429668869 5325", >>> > "tables": { >>> > "Logical_Switch": { >>> > "columns": { >>> > @@ -78,6 +78,11 @@ >>> > "max": "unlimited"}}, >>> > "default_gw": {"type": {"key": "string", "min": 0, >>> "max": >>> > 1}}, >>> > "enabled": {"type": {"key": "boolean", "min": 0, >>> "max": >>> > 1}}, >>> > + "options": { >>> > + "type": {"key": "string", >>> > + "value": "string", >>> > + "min": 0, >>> > + "max": "unlimited"}}, >>> > "external_ids": { >>> > "type": {"key": "string", "value": "string", >>> > "min": 0, "max": "unlimited"}}}, >>> > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml >>> > index d7fd595..d239499 100644 >>> > --- a/ovn/ovn-nb.xml >>> > +++ b/ovn/ovn-nb.xml >>> > @@ -630,6 +630,21 @@ >>> > column is set to <code>false</code>, the router is disabled. A >>> > disabled >>> > router has all ingress and egress traffic dropped. >>> > </column> >>> > + >>> > + <group title="Options"> >>> > + <p> >>> > + Additional options for the logical router. >>> > + </p> >>> > + >>> > + <column name="options" key="chassis"> >>> > + If set, indicates that the logical router in question is >>> > + non-distributed and resides in the set chassis. The same >>> > + value is also used by <code>ovn-controller</code> to >>> > + uniquely identify the chassis in the OVN deployment and >>> > + comes from <code>external_ids:system-id</code> in the >>> > + <code>Open_vSwitch</code> table of Open_vSwitch database. >>> > + </column> >>> > + </group> >>> > >>> > <group title="Common Columns"> >>> > <column name="external_ids"> >>> > diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml >>> > index efd2f9a..741228c 100644 >>> > --- a/ovn/ovn-sb.xml >>> > +++ b/ovn/ovn-sb.xml >>> > @@ -1220,7 +1220,12 @@ tcp.flags = RST; >>> > which >>> <code>ovn-controller</code>/<code>ovn-controller-vtep</code> >>> > in >>> > turn finds out by monitoring the local hypervisor's Open_vSwitch >>> > database, which identifies logical ports via the conventions >>> > described >>> > - in <code>IntegrationGuide.md</code>. >>> > + in <code>IntegrationGuide.md</code>. (The exceptions are for >>> > + <code>Port_Binding</code> records of <code>type</code> >>> 'gateway', >>> > + whose locations are identified by <code>ovn-northd</code> via >>> > + the <code>options:gateway-chassis</code> column in this table. >>> > + <code>ovn-controller</code> is still responsible to populate the >>> > + <code>chassis</code> column.) >>> > </p> >>> > >>> > <p> >>> > @@ -1298,6 +1303,14 @@ tcp.flags = RST; >>> > a logical router to a logical switch or to another logical >>> > router. >>> > </dd> >>> > >>> > + <dt><code>gateway</code></dt> >>> > + <dd> >>> > + One of a pair of logical ports that act as if connected >>> by a >>> > patch >>> > + cable across multiple chassis. Useful for connecting a >>> > logical >>> > + switch with a gateway router (which is only resident on a >>> > + particular chassis). >>> > + </dd> >>> > + >>> > <dt><code>localnet</code></dt> >>> > <dd> >>> > A connection to a locally accessible network from each >>> > @@ -1336,6 +1349,26 @@ tcp.flags = RST; >>> > </column> >>> > </group> >>> > >>> > + <group title="Gateway Options"> >>> > + <p> >>> > + These options apply to logical ports with <ref >>> column="type"/> of >>> > + <code>gateway</code>. >>> > + </p> >>> > + >>> > + <column name="options" key="peer"> >>> > + The <ref column="logical_port"/> in the <ref >>> > table="Port_Binding"/> >>> > + record for the other side of the 'gateway' port. The named >>> <ref >>> > + column="logical_port"/> must specify this <ref >>> > column="logical_port"/> >>> > + in its own <code>peer</code> option. That is, the two >>> 'gateway' >>> > + logical ports must have reversed <ref column="logical_port"/> >>> and >>> > + <code>peer</code> values. >>> > + </column> >>> > + >>> > + <column name="options" key="gateway-chassis"> >>> > + The <code>chassis</code> in which the port resides. >>> > + </column> >>> > + </group> >>> > + >>> > <group title="Localnet Options"> >>> > <p> >>> > These options apply to logical ports with <ref >>> column="type"/> of >>> > diff --git a/tests/ovn.at b/tests/ovn.at >>> > index a827b71..9d93064 100644 >>> > --- a/tests/ovn.at >>> > +++ b/tests/ovn.at >>> > @@ -2848,3 +2848,187 @@ OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) >>> > OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> > >>> > AT_CLEANUP >>> > + >>> > + >>> > +AT_SETUP([ovn -- 2 HVs, 2 LRs connected via LS, gateway router]) >>> > +AT_KEYWORDS([ovngatewayrouter]) >>> > +AT_SKIP_IF([test $HAVE_PYTHON = no]) >>> > +ovn_start >>> > + >>> > +# Logical network: >>> > +# Two LRs - R1 and R2 that are connected to each other via LS "join" >>> > +# in 20.0.0.0/24 network. R1 has switchess foo (192.168.1.0/24) >>> > +# connected to it. R2 has alice (172.16.1.0/24) connected to it. >>> > +# R2 is a gateway router. >>> > + >>> > + >>> > + >>> > +# Create two hypervisor and create OVS ports corresponding to logical >>> > ports. >>> > +net_add n1 >>> > + >>> > +sim_add hv1 >>> > +as hv1 >>> > +ovs-vsctl add-br br-phys >>> > +ovn_attach n1 br-phys 192.168.0.1 >>> > +ovs-vsctl -- add-port br-int hv1-vif1 -- \ >>> > + set interface hv1-vif1 external-ids:iface-id=foo1 \ >>> > + options:tx_pcap=hv1/vif1-tx.pcap \ >>> > + options:rxq_pcap=hv1/vif1-rx.pcap \ >>> > + ofport-request=1 >>> > + >>> > + >>> > +sim_add hv2 >>> > +as hv2 >>> > +ovs-vsctl add-br br-phys >>> > +ovn_attach n1 br-phys 192.168.0.2 >>> > +ovs-vsctl -- add-port br-int hv2-vif1 -- \ >>> > + set interface hv2-vif1 external-ids:iface-id=alice1 \ >>> > + options:tx_pcap=hv2/vif1-tx.pcap \ >>> > + options:rxq_pcap=hv2/vif1-rx.pcap \ >>> > + ofport-request=1 >>> > + >>> > +# Pre-populate the hypervisors' ARP tables so that we don't lose any >>> > +# packets for ARP resolution (native tunneling doesn't queue packets >>> > +# for ARP resolution). >>> > +ovn_populate_arp >>> > + >>> > +ovn-nbctl create Logical_Router name=R1 >>> > +ovn-nbctl create Logical_Router name=R2 options:chassis="hv2" >>> > + >>> > +ovn-nbctl lswitch-add foo >>> > +ovn-nbctl lswitch-add alice >>> > +ovn-nbctl lswitch-add join >>> > + >>> > +# Connect foo to R1 >>> > +ovn-nbctl -- --id=@lrp create Logical_Router_port name=foo \ >>> > +network=192.168.1.1/24 mac=\"00:00:01:01:02:03\" -- add >>> Logical_Router >>> > R1 \ >>> > +ports @lrp -- lport-add foo rp-foo >>> > + >>> > +ovn-nbctl set Logical_port rp-foo type=router options:router-port=foo >>> \ >>> > +addresses=\"00:00:01:01:02:03\" >>> > + >>> > +# Connect alice to R2 >>> > +ovn-nbctl -- --id=@lrp create Logical_Router_port name=alice \ >>> > +network=172.16.1.1/24 mac=\"00:00:02:01:02:03\" -- add >>> Logical_Router R2 >>> > \ >>> > +ports @lrp -- lport-add alice rp-alice >>> > + >>> > +ovn-nbctl set Logical_port rp-alice type=router >>> options:router-port=alice >>> > \ >>> > +addresses=\"00:00:02:01:02:03\" >>> > + >>> > + >>> > +# Connect R1 to join >>> > +ovn-nbctl -- --id=@lrp create Logical_Router_port name=R1_join \ >>> > +network=20.0.0.1/24 mac=\"00:00:04:01:02:03\" -- add Logical_Router >>> R1 \ >>> > +ports @lrp -- lport-add join r1-join >>> > + >>> > +ovn-nbctl set Logical_port r1-join type=router >>> > options:router-port=R1_join \ >>> > +addresses='"00:00:04:01:02:03"' >>> > + >>> > +# Connect R2 to join >>> > +ovn-nbctl -- --id=@lrp create Logical_Router_port name=R2_join \ >>> > +network=20.0.0.2/24 mac=\"00:00:04:01:02:04\" -- add Logical_Router >>> R2 \ >>> > +ports @lrp -- lport-add join r2-join >>> > + >>> > +ovn-nbctl set Logical_port r2-join type=router >>> > options:router-port=R2_join \ >>> > +addresses='"00:00:04:01:02:04"' >>> > + >>> > + >>> > +#install static routes >>> > +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \ >>> > +ip_prefix=172.16.1.0/24 nexthop=20.0.0.2 -- add Logical_Router \ >>> > +R1 static_routes @lrt >>> > + >>> > +ovn-nbctl -- --id=@lrt create Logical_Router_Static_Route \ >>> > +ip_prefix=192.168.1.0/24 nexthop=20.0.0.1 -- add Logical_Router \ >>> > +R2 static_routes @lrt >>> > + >>> > +# Create logical port foo1 in foo >>> > +ovn-nbctl lport-add foo foo1 \ >>> > +-- lport-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2" >>> > + >>> > +# Create logical port alice1 in alice >>> > +ovn-nbctl lport-add alice alice1 \ >>> > +-- lport-set-addresses alice1 "f0:00:00:01:02:04 172.16.1.2" >>> > + >>> > + >>> > +# Allow some time for ovn-northd and ovn-controller to catch up. >>> > +# XXX This should be more systematic. >>> > +sleep 2 >>> > + >>> > +ip_to_hex() { >>> > + printf "%02x%02x%02x%02x" "$@" >>> > +} >>> > +trim_zeros() { >>> > + sed 's/\(00\)\{1,\}$//' >>> > +} >>> > + >>> > +# Send ip packets between foo1 and alice1 >>> > +src_mac="f00000010203" >>> > +dst_mac="000001010203" >>> > +src_ip=`ip_to_hex 192 168 1 2` >>> > +dst_ip=`ip_to_hex 172 16 1 2` >>> > >>> > >>> +packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 >>> > + >>> > +echo "---------NB dump-----" >>> > +ovn-nbctl show >>> > +echo "---------------------" >>> > +ovn-nbctl list logical_router >>> > +echo "---------------------" >>> > +ovn-nbctl list logical_router_port >>> > +echo "---------------------" >>> > + >>> > +echo "---------SB dump-----" >>> > +ovn-sbctl list datapath_binding >>> > +echo "---------------------" >>> > +ovn-sbctl list port_binding >>> > +echo "---------------------" >>> > +ovn-sbctl dump-flows >>> > +echo "---------------------" >>> > +ovn-sbctl list chassis >>> > +ovn-sbctl list encap >>> > +echo "---------------------" >>> > + >>> > +echo "------ hv1 dump ----------" >>> > +as hv1 ovs-ofctl show br-int >>> > +as hv1 ovs-ofctl dump-flows br-int >>> > +echo "------ hv2 dump ----------" >>> > +as hv2 ovs-ofctl show br-int >>> > +as hv2 ovs-ofctl dump-flows br-int >>> > +echo "----------------------------" >>> > + >>> > +# Packet to Expect at alice1 >>> > +src_mac="000002010203" >>> > +dst_mac="f00000010204" >>> > +src_ip=`ip_to_hex 192 168 1 2` >>> > +dst_ip=`ip_to_hex 172 16 1 2` >>> > >>> > >>> +expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000 >>> > + >>> > + >>> > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet >>> > +as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet >>> > + >>> > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv2/vif1-tx.pcap | >>> > trim_zeros > received1.packets >>> > +echo $expected | trim_zeros > expout >>> > +AT_CHECK([cat received1.packets], [0], [expout]) >>> > + >>> > +for sim in hv1 hv2; do >>> > + as $sim >>> > + OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>> > + OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) >>> > + OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> > +done >>> > + >>> > +as ovn-sb >>> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> > + >>> > +as ovn-nb >>> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> > + >>> > +as northd >>> > +OVS_APP_EXIT_AND_WAIT([ovn-northd]) >>> > + >>> > +as main >>> > +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) >>> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) >>> > + >>> > +AT_CLEANUP >>> > -- >>> > 1.9.1 >>> > >>> > _______________________________________________ >>> > dev mailing list >>> > dev@openvswitch.org >>> > http://openvswitch.org/mailman/listinfo/dev >>> > >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> http://openvswitch.org/mailman/listinfo/dev >>> >> >> > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev