Load-balancers in gateway routers lets us load-balance north-south traffic.
This commit adds a new table called "DEFRAG" in the logical router pipeline to defragment packets and to track them. Once the packet is tracked, new connections get a group id as an action. The group in turn chooses a DNAT action. Established connections go through the DNAT table for a regular DNAT. Signed-off-by: Gurucharan Shetty <g...@ovn.org> --- ovn/northd/ovn-northd.8.xml | 63 +++++++++++++++---- ovn/northd/ovn-northd.c | 150 +++++++++++++++++++++++++++++++++++++++++--- ovn/ovn-nb.ovsschema | 9 ++- ovn/ovn-nb.xml | 5 ++ tests/system-ovn.at | 144 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 347 insertions(+), 24 deletions(-) diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index 3448370..d2388b6 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -259,7 +259,7 @@ in ingress table <code>LB</code> and <code>Stateful</code>. It contains a priority-0 flow that simply moves traffic to the next table. If load balancing rules with virtual IP addresses (and ports) are configured in - <code>OVN_Northbound</code> database for a logical datapath, a + <code>OVN_Northbound</code> database for a logical switch datapath, a priority-100 flow is added for each configured virtual IP address <var>VIP</var> with a match <code>ip && ip4.dst == <var>VIP</var> </code> that sets an action <code>reg0[0] = 1; next;</code> to act as a @@ -379,7 +379,7 @@ <ul> <li> - For all the configured load balancing rules in + For all the configured load balancing rules for a switch in <code>OVN_Northbound</code> database that includes a L4 port <var>PORT</var> of protocol <var>P</var> and IPv4 address <var>VIP</var>, a priority-120 flow that matches on @@ -390,7 +390,7 @@ optional port numbers) to load balance to. </li> <li> - For all the configured load balancing rules in + For all the configured load balancing rules for a switch in <code>OVN_Northbound</code> database that includes just an IP address <var>VIP</var> to match on, a priority-110 flow that matches on <code>ct.new && ip && ip4.dst == <var>VIP</var></code> @@ -880,8 +880,9 @@ output; <li> <p> These flows reply to ARP requests for the virtual IP addresses - configured in the router for DNAT. For a configured DNAT IP address - <var>A</var>, for each router port <var>P</var> with Ethernet + configured in the router for DNAT or load balancing. For a + configured DNAT IP address or a load balancer VIP <var>A</var>, + for each router port <var>P</var> with Ethernet address <var>E</var>, a priority-90 flow matches <code>inport == <var>P</var> && arp.op == 1 && arp.tpa == <var>A</var></code> (ARP request) @@ -1063,12 +1064,26 @@ icmp4 { <li> Next table. A priority-0 flows match all packets that aren't already - handled and uses actions <code>next;</code> to feed them to the ingress - table for routing. + handled and uses actions <code>next;</code> to feed them to the next + table. </li> </ul> - <h3>Ingress Table 2: UNSNAT</h3> + <h3>Ingress Table 2: DEFRAG</h3> + + <p> + This is to send packets to connection tracker for tracking and + defragmentation. It contains a priority-0 flow that simply moves traffic + to the next table. If load balancing rules with virtual IP addresses + (and ports) are configured in <code>OVN_Northbound</code> database for a + Gateway router, a priority-100 flow is added for each configured virtual + IP address <var>VIP</var> with a match <code>ip && + ip4.dst == <var>VIP</var></code> that sets an action + <code>ct_next;</code> to send IP packets to the connection tracker for + packet de-fragmentation and tracking before sending it to the next table. + </p> + + <h3>Ingress Table 3: UNSNAT</h3> <p> This is for already established connections' reverse traffic. @@ -1094,7 +1109,7 @@ icmp4 { </li> </ul> - <h3>Ingress Table 3: DNAT</h3> + <h3>Ingress Table 4: DNAT</h3> <p> Packets enter the pipeline with destination IP address that needs to @@ -1104,6 +1119,28 @@ icmp4 { <ul> <li> <p> + For all the configured load balancing rules for Gateway router in + <code>OVN_Northbound</code> database that includes a L4 port + <var>PORT</var> of protocol <var>P</var> and IPv4 address + <var>VIP</var>, a priority-120 flow that matches on + <code>ct.new && ip && ip4.dst == <var>VIP</var> + && <var>P</var> && <var>P</var>.dst == <var>PORT + </var></code> with an action of <code>ct_lb(<var>args</var>)</code>, + where <var>args</var> contains comma separated IPv4 addresses (and + optional port numbers) to load balance to. + </p> + + <p> + For all the configured load balancing rules for Gateway router in + <code>OVN_Northbound</code> database that includes just an IP address + <var>VIP</var> to match on, a priority-110 flow that matches on + <code>ct.new && ip && ip4.dst == + <var>VIP</var></code> with an action of + <code>ct_lb(<var>args</var>)</code>, where <var>args</var> contains + comma separated IPv4 addresses. + </p> + + <p> For each configuration in the OVN Northbound database, that asks to change the destination IP address of a packet from <var>A</var> to <var>B</var>, a priority-100 flow matches <code>ip && @@ -1123,7 +1160,7 @@ icmp4 { </li> </ul> - <h3>Ingress Table 4: IP Routing</h3> + <h3>Ingress Table 5: IP Routing</h3> <p> A packet that arrives at this table is an IP packet that should be @@ -1134,7 +1171,7 @@ icmp4 { packet's final destination, unchanged) and advances to the next table for ARP resolution. It also sets <code>reg1</code> (or <code>xxreg1</code>) to the IP address owned by the selected router - port (Table 6 will generate ARP request, if needed, with + port (Table 7 will generate ARP request, if needed, with <code>reg0</code> as the target protocol address and <code>reg1</code> as the source protocol address). </p> @@ -1215,7 +1252,7 @@ next; </li> </ul> - <h3>Ingress Table 5: ARP/ND Resolution</h3> + <h3>Ingress Table 6: ARP/ND Resolution</h3> <p> Any packet that reaches this table is an IP packet whose next-hop @@ -1297,7 +1334,7 @@ next; </li> </ul> - <h3>Ingress Table 6: ARP Request</h3> + <h3>Ingress Table 7: ARP Request</h3> <p> In the common case where the Ethernet destination has been resolved, this diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 0874a9c..ccb49c4 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -121,11 +121,12 @@ enum ovn_stage { /* Logical router ingress stages. */ \ PIPELINE_STAGE(ROUTER, IN, ADMISSION, 0, "lr_in_admission") \ PIPELINE_STAGE(ROUTER, IN, IP_INPUT, 1, "lr_in_ip_input") \ - PIPELINE_STAGE(ROUTER, IN, UNSNAT, 2, "lr_in_unsnat") \ - PIPELINE_STAGE(ROUTER, IN, DNAT, 3, "lr_in_dnat") \ - PIPELINE_STAGE(ROUTER, IN, IP_ROUTING, 4, "lr_in_ip_routing") \ - PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 5, "lr_in_arp_resolve") \ - PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 6, "lr_in_arp_request") \ + PIPELINE_STAGE(ROUTER, IN, DEFRAG, 2, "lr_in_defrag") \ + PIPELINE_STAGE(ROUTER, IN, UNSNAT, 3, "lr_in_unsnat") \ + PIPELINE_STAGE(ROUTER, IN, DNAT, 4, "lr_in_dnat") \ + PIPELINE_STAGE(ROUTER, IN, IP_ROUTING, 5, "lr_in_ip_routing") \ + PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 6, "lr_in_arp_resolve") \ + PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 7, "lr_in_arp_request") \ \ /* Logical router egress stages. */ \ PIPELINE_STAGE(ROUTER, OUT, SNAT, 0, "lr_out_snat") \ @@ -3267,6 +3268,66 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ds_cstr(&match), ds_cstr(&actions)); } + /* A set to hold all load-balancer vips that need ARP responses. */ + struct sset all_ips = SSET_INITIALIZER(&all_ips); + + for (int i = 0; i < op->od->nbr->n_load_balancer; i++) { + struct nbrec_load_balancer *lb = op->od->nbr->load_balancer[i]; + struct smap *vips = &lb->vips; + struct smap_node *node; + + SMAP_FOR_EACH (node, vips) { + /* node->key contains IP:port or just IP. */ + char *ip_address = NULL; + uint16_t port; + + ip_address_and_port_from_lb_key(node->key, &ip_address, &port); + if (!ip_address) { + continue; + } + + if (!sset_contains(&all_ips, ip_address)) { + sset_add(&all_ips, ip_address); + } + + free(ip_address); + } + } + + const char *ip_address; + SSET_FOR_EACH(ip_address, &all_ips) { + ovs_be32 ip; + if (!ip_parse(ip_address, &ip) || !ip) { + continue; + } + + ds_clear(&match); + ds_put_format(&match, + "inport == %s && arp.tpa == "IP_FMT" && arp.op == 1", + op->json_key, IP_ARGS(ip)); + + ds_clear(&actions); + ds_put_format(&actions, + "eth.dst = eth.src; " + "eth.src = %s; " + "arp.op = 2; /* ARP reply */ " + "arp.tha = arp.sha; " + "arp.sha = %s; " + "arp.tpa = arp.spa; " + "arp.spa = "IP_FMT"; " + "outport = %s; " + "flags.loopback = 1; " + "output;", + op->lrp_networks.ea_s, + op->lrp_networks.ea_s, + IP_ARGS(ip), + op->json_key); + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90, + ds_cstr(&match), ds_cstr(&actions)); + } + + sset_destroy(&all_ips); + ovs_be32 *snat_ips = xmalloc(sizeof *snat_ips * op->od->nbr->n_nat); size_t n_snat_ips = 0; for (int i = 0; i < op->od->nbr->n_nat; i++) { @@ -3421,22 +3482,90 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, } } - /* NAT in Gateway routers. */ + /* NAT, Defrag and load balancing in Gateway routers. */ HMAP_FOR_EACH (od, key_node, datapaths) { if (!od->nbr) { continue; } /* Packets are allowed by default. */ + ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;"); - /* NAT rules are only valid on Gateway routers. */ + /* NAT rules, packet defrag and load balancing are only valid on + * Gateway routers. */ if (!smap_get(&od->nbr->options, "chassis")) { continue; } + /* A set to hold all ips that need defragmentation and tracking. */ + struct sset all_ips = SSET_INITIALIZER(&all_ips); + + for (int i = 0; i < od->nbr->n_load_balancer; i++) { + struct nbrec_load_balancer *lb = od->nbr->load_balancer[i]; + struct smap *vips = &lb->vips; + struct smap_node *node; + + SMAP_FOR_EACH (node, vips) { + uint16_t port = 0; + + /* node->key contains IP:port or just IP. */ + char *ip_address = NULL; + ip_address_and_port_from_lb_key(node->key, &ip_address, &port); + if (!ip_address) { + continue; + } + + if (!sset_contains(&all_ips, ip_address)) { + sset_add(&all_ips, ip_address); + } + + /* Higher priority rules are added in DNAT table to match on + * ct.new which in-turn have group id as an action for load + * balancing. */ + ds_clear(&actions); + ds_put_format(&actions, "ct_lb(%s);", node->value); + + ds_clear(&match); + ds_put_format(&match, "ct.new && ip && ip4.dst == %s", + ip_address); + free(ip_address); + + if (port) { + if (lb->protocol && !strcmp(lb->protocol, "udp")) { + ds_put_format(&match, "&& udp && udp.dst == %d", port); + } else { + ds_put_format(&match, "&& tcp && tcp.dst == %d", port); + } + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, + 120, ds_cstr(&match), ds_cstr(&actions)); + } else { + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, + 110, ds_cstr(&match), ds_cstr(&actions)); + } + } + } + + /* If there are any load balancing rules, we should send the + * packet to conntrack for defragmentation and tracking. This helps + * with two things. + * + * 1. With tracking, we can send only new connections to pick a + * DNAT ip address from a group. + * 2. If there are L4 ports in load balancing rules, we need the + * defragmentation to match on L4 ports. */ + const char *ip_address; + SSET_FOR_EACH(ip_address, &all_ips) { + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s", ip_address); + ovn_lflow_add(lflows, od, S_ROUTER_IN_DEFRAG, + 100, ds_cstr(&match), "ct_next; next;"); + } + + sset_destroy(&all_ips); + for (int i = 0; i < od->nbr->n_nat; i++) { const struct nbrec_nat *nat; @@ -3531,7 +3660,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, } /* Re-circulate every packet through the DNAT zone. - * This helps with two things. + * This helps with three things. * * 1. Any packet that needs to be unDNATed in the reverse * direction gets unDNATed. Ideally this could be done in @@ -3540,7 +3669,10 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, * ip address being external IP address for IP routing, * we can do it here, saving a future re-circulation. * - * 2. Any packet that was sent through SNAT zone in the + * 2. Established load-balanced connections automatically get + * DNATed. + * + * 3. Any packet that was sent through SNAT zone in the * previous table automatically gets re-circulated to get * back the new destination IP address that is needed for * routing in the openflow pipeline. */ diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema index 456ae98..80c2f2f 100644 --- a/ovn/ovn-nb.ovsschema +++ b/ovn/ovn-nb.ovsschema @@ -1,7 +1,7 @@ { "name": "OVN_Northbound", - "version": "5.3.1", - "cksum": "1921908091 9353", + "version": "5.3.2", + "cksum": "189899446 9689", "tables": { "NB_Global": { "columns": { @@ -137,6 +137,11 @@ "refType": "strong"}, "min": 0, "max": "unlimited"}}, + "load_balancer": {"type": {"key": {"type": "uuid", + "refTable": "Load_Balancer", + "refType": "strong"}, + "min": 0, + "max": "unlimited"}}, "options": { "type": {"key": "string", "value": "string", diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml index 5719e74..6e56bd0 100644 --- a/ovn/ovn-nb.xml +++ b/ovn/ovn-nb.xml @@ -841,6 +841,11 @@ Gateway routers. </column> + <column name="load_balancer"> + Load balance a virtual ipv4 address to a set of logical port ipv4 + addresses. Load balancer rules only work on the Gateway routers. + </column> + <group title="Options"> <p> Additional options for the logical router. diff --git a/tests/system-ovn.at b/tests/system-ovn.at index e267384..0f02e52 100755 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -523,3 +523,147 @@ OVS_APP_EXIT_AND_WAIT([ovn-northd]) as OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d"]) AT_CLEANUP + +AT_SETUP([ovn -- load balancing in gateway router]) +AT_KEYWORDS([ovnlb]) + +CHECK_CONNTRACK() +CHECK_CONNTRACK_NAT() +ovn_start +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-int]) + +# Set external-ids in br-int needed for ovn-controller +ovs-vsctl \ + -- set Open_vSwitch . external-ids:system-id=hv1 \ + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true + +# Start ovn-controller +start_daemon ovn-controller + +# Logical network: +# Two LRs - R1 and R2 that are connected to each other via LS "join" +# in 20.0.0.0/24 network. R1 has switchess foo (192.168.1.0/24) and +# bar (192.168.2.0/24) connected to it. R2 has alice (172.16.1.0/24) connected +# to it. R2 is a gateway router on which we add load-balancing rules. +# +# foo -- R1 -- join - R2 -- alice +# | +# bar ---- + +ovn-nbctl create Logical_Router name=R1 +ovn-nbctl create Logical_Router name=R2 options:chassis=hv1 + +ovn-nbctl ls-add foo +ovn-nbctl ls-add bar +ovn-nbctl ls-add alice +ovn-nbctl ls-add join + +# Connect foo to R1 +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24 +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \ + type=router options:router-port=foo addresses=\"00:00:01:01:02:03\" + +# Connect bar to R1 +ovn-nbctl lrp-add R1 bar 00:00:01:01:02:04 192.168.2.1/24 +ovn-nbctl lsp-add bar rp-bar -- set Logical_Switch_Port rp-bar \ + type=router options:router-port=bar addresses=\"00:00:01:01:02:04\" + +# Connect alice to R2 +ovn-nbctl lrp-add R2 alice 00:00:02:01:02:03 172.16.1.1/24 +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \ + type=router options:router-port=alice addresses=\"00:00:02:01:02:03\" + +# Connect R1 to join +ovn-nbctl lrp-add R1 R1_join 00:00:04:01:02:03 20.0.0.1/24 +ovn-nbctl lsp-add join r1-join -- set Logical_Switch_Port r1-join \ + type=router options:router-port=R1_join addresses='"00:00:04:01:02:03"' + +# Connect R2 to join +ovn-nbctl lrp-add R2 R2_join 00:00:04:01:02:04 20.0.0.2/24 +ovn-nbctl lsp-add join r2-join -- set Logical_Switch_Port r2-join \ + type=router options:router-port=R2_join addresses='"00:00:04:01:02:04"' + +# Static routes. +ovn-nbctl lr-route-add R1 172.16.1.0/24 20.0.0.2 +ovn-nbctl lr-route-add R2 192.168.0.0/16 20.0.0.1 + +# Logical port 'foo1' in switch 'foo'. +ADD_NAMESPACES(foo1) +ADD_VETH(foo1, foo1, br-int, "192.168.1.2/24", "f0:00:00:01:02:03", \ + "192.168.1.1") +ovn-nbctl lsp-add foo foo1 \ +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2" + +# Logical port 'alice1' in switch 'alice'. +ADD_NAMESPACES(alice1) +ADD_VETH(alice1, alice1, br-int, "172.16.1.2/24", "f0:00:00:01:02:04", \ + "172.16.1.1") +ovn-nbctl lsp-add alice alice1 \ +-- lsp-set-addresses alice1 "f0:00:00:01:02:04 172.16.1.2" + +# Logical port 'bar1' in switch 'bar'. +ADD_NAMESPACES(bar1) +ADD_VETH(bar1, bar1, br-int, "192.168.2.2/24", "f0:00:00:01:02:05", \ +"192.168.2.1") +ovn-nbctl lsp-add bar bar1 \ +-- lsp-set-addresses bar1 "f0:00:00:01:02:05 192.168.2.2" + +# Config OVN load-balancer with a VIP. +uuid=`ovn-nbctl create load_balancer vips:30.0.0.1="192.168.1.2,192.168.2.2"` +ovn-nbctl set logical_router R2 load_balancer=$uuid + +# Config OVN load-balancer with another VIP (this time with ports). +ovn-nbctl set load_balancer $uuid vips:'"30.0.0.2:8000"'='"192.168.1.2:80,192.168.2.2:80"' + +# Wait for ovn-controller to catch up. +OVS_WAIT_UNTIL([ovs-ofctl -O OpenFlow13 dump-groups br-int | grep ct\(]) + +# Start webservers in 'foo1', 'bar1'. +NETNS_DAEMONIZE([foo1], [[$PYTHON $srcdir/test-l7.py]], [http1.pid]) +NETNS_DAEMONIZE([bar1], [[$PYTHON $srcdir/test-l7.py]], [http2.pid]) + +dnl Should work with the virtual IP address through NAT +for i in `seq 1 20`; do + echo Request $i + NS_CHECK_EXEC([alice1], [wget 30.0.0.1 -t 5 -T 1 --retry-connrefused -v -o wget$i.log]) +done + +dnl Each server should have at least one connection. +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.1) | +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl +tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>) +tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>) +]) + +dnl Test load-balancing that includes L4 ports in NAT. +for i in `seq 1 20`; do + echo Request $i + NS_CHECK_EXEC([alice1], [wget 30.0.0.2:8000 -t 5 -T 1 --retry-connrefused -v -o wget$i.log]) +done + +dnl Each server should have at least one connection. +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(30.0.0.2) | +sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>) +tcp,orig=(src=172.16.1.2,dst=30.0.0.2,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>) +]) + +OVS_APP_EXIT_AND_WAIT([ovn-controller]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d +/connection dropped.*/d"]) +AT_CLEANUP -- 1.9.1 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev