On 21 June 2016 at 10:29, Flaviof <fla...@flaviof.com> wrote: > On Tue, Jun 21, 2016 at 10:46 AM, Guru Shetty <g...@ovn.org> wrote: > > > > > > > On 20 June 2016 at 19:36, Flaviof <fla...@flaviof.com> wrote: > > > >> On Mon, Jun 13, 2016 at 6:45 AM, Gurucharan Shetty <g...@ovn.org> > wrote: > >> > >> > For traffic from physical space to virtual space we need DNAT. > >> > The DNAT happens in the gateway router and reaches the logical > >> > port. The return traffic should be unDNATed. > >> > > >> > Traffic originating in virtual space heading to physical space > >> > should be SNATed. The return traffic is unSNATted. > >> > > >> > East-west traffic with the public destination IP address needs > >> > a DNAT. This traffic is punted to the l3 gateway where DNAT > >> > takes place. This traffic is also SNATed and eventually loops back to > >> > its destination. The SNAT is needed because we need the reverse > traffic > >> > to go back to the l3 gateway and not short-circuit directly to the > >> source. > >> > > >> > This commit introduces 4 new logical actions. > >> > 1. ct_snat: To send the packet through SNAT zone to unSNAT packets. > >> > 2. ct_snat(IP): To SNAT to the provided IP address. > >> > 3. ct_dnat: To send the packet throgh DNAT zone to unDNAT packets. > >> > 4. ct_dnat(IP): To DNAT to the provided IP. > >> > > >> > This commit only provides the ability to do IP based NAT. This will > >> > eventually be enhanced to do PORT based NAT too. > >> > > >> > Command hints: > >> > > >> > Consider a distributed router "R1" that has switch foo ( > 192.168.1.0/24) > >> > with a lport foo1 (192.168.1.2) and bar (192.168.2.0/24) with lport > >> bar1 > >> > (192.168.2.2) connected to it. You connect "R1" to > >> > a gateway router "R2" via a switch "join" in (20.0.0.0/24) network. > >> > > >> > R2 has a switch "alice" (172.16.1.0/24) connected to it (to simulate > >> > external network). > >> > > >> > case: Add pure DNAT (north-south) > >> > > >> > Add a DNAT rule in R2: > >> > ovn-nbctl -- --id=@nat create nat type="dnat" logical_ip=192.168.1.2 \ > >> > external_ip=30.0.0.2 -- add logical_router R2 nat @nat > >> > > >> > Now alice1 should be able to ping 192.168.1.2 via 30.0.0.2. > >> > > >> > case2 : Add pure SNAT (south-north) > >> > > >> > Add a SNAT rule in R2: > >> > > >> > ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \ > >> > external_ip=30.0.0.1 -- add logical_router R2 nat @nat > >> > > >> > (You need a static route in R1 to send packets destined to outside > >> > world to go through R2. The logical_ip can be a subnet.) > >> > > >> > When bar1 pings alice1, alice1 receives traffic from 30.0.0.1 > >> > > >> > case3 : SNAT and DNAT (east-west traffic) > >> > > >> > When bar1 pings 30.0.0.2, the traffic jumps to the gateway router > >> > and loops back to foo1 with a source ip address of 30.0.0.1 > >> > > >> > > >> So, is 30.0.0.0/x network an external network that R2 has a port too? > >> > > > > The example above does not have that. In the above example 30.0.0.0/x is > > being treated as virtual address. But in a real setup (non-simulated), > you > > are right. R2 will be connected to a 30.0.0.0/x network and will have a > > port in it. It will also have a static route (0.0.0.0/0) or a > > default_gateway to point to the physical router IP address as its next > hop. > > (I have not tested it as I do not have a real setup at hand, but based on > > the simulation, it should ideally work.) > > > > > >> What is the next hop that R2 would use to reach a destination beyond > >> that subnet? > >> > > Answered above. > > > > Ack! > > > > > >> > >> I think this may be clear when a test is added to ovn.at, which uses > foo, > >> bar, join, alice > >> > > The unit tests do not have the ability to do conntrack NAT right now. I > > think we should add one once Daniele introduces NAT to usespace > conntrack. > > But the unit test "ovn -- 2 HVs, 2 LRs connected via LS, gateway router" > > does something very similar (it has foo - R1 - join - R2 - alice). > > > > Right, I saw that test and it makes perfect sense. Adding the 'bar' logical > switch, net 30.0.0.x and the nat rules are the few lines that it currently > does not have. > > > > > >> > >> Based on the code and my little test setup, there seems to be a high > cost > >> for DNAT entries in that an ARP response rule will be added per DNAT x > all > >> router ports. > > > > The intention was to add only on the router where DNAT entry is defined > > and not on all router ports of all routers. Is it not true? (If so, this > is > > a bug. ). The for loop which adds this entry, only looks at that > datapath's > > NAT entries. > > > > On the gateway router itself, there would be typically two DNAT entries. > > One of them connected to internal network (for east-west) and another one > > at external port (facing physical router). > > > > > Understood. > > > > > > > >> In the example used by the commit message, ingress table 1 of > >> the logical router will have arp response entries for inports alice and > >> R2_join. > >> > > Right. That is because as explained above, I need to do DNAT for both > > east-west as well as north-south. (It is very possible that I did not > > understand your concern) > > > > Nah, you set me straight. If there were multiple internal subnets I imagine > we will need a DNAT > rule for each, since the response needs to be slightly different for each > router port. Not an issue, just an observation. > > > > > > > >> > >> > >> Table 3: do we really intend to apply the actions 'inport = ""; > ct_dnat;' > >> to all ip packets that do not have an explicit dnat mapping? > >> > > Yes. This is a little tricky. I have tried to explain the rationale in a > > comment above. The general idea is that in a gateway router, there will > be > > atleast one DNAT or SNAT entry. Otherwise, why have a gateway router? > Also, > > a re-circulation is considered to be very expensive. What we want is to > > minimize re-circulations. With the code above, we have a minimum of > > one-recirculation no matter what and a maximum of two re-circulations. I > > have tried different ways to optimize it. There was a possibility of 3 > > re-circulations as a worst case if I did not force the minimum one > > re-circulation. Probably there is a different way to optimize it (that I > > haven't thought about). > > > > > > > Thanks for the clarification. I don't know enough about the implications of > calling > the ct_dnat action, but I imagine that is just noise and -- like you point > out -- this is only in the > gateway router and saves on recirculations. > > > > > > > > >> > >> SNAT: do we need ARP reply rules for the SNAT addresses, similar to the > >> ones added for DNAT? > >> > > I don't think we need ARP reply rules for SNAT entries. What is the use > > case? > > > > This is likely a moot point in my part. It is just that because in my > example, the gateway > router did not have a port in the 30.0.0.x network. So it was not obvious > to me that if > it did, it would have the ARP response rule for it's own address, which is > masking the > internal ips for foo and bar. Sorry for not understanding that before > making the noise. :) > > > > > >> > >> SNAT: looking at the openflow table I see n mentioning of the address > >> added > >> to support SNAT. Ist that because that is all handled by connect_tracker > >> and there is nothing to be done via openflow? Or maybe part of another > >> patchset? > >> > > > > We do add SNAT specific rules. Search for S_ROUTER_IN_UNSNAT > > and S_ROUTER_OUT_SNAT. > > > > > > Ack, I missed that in the egress datapath. *facepalm* > > > > > > >> Thanks, > >> > >> -- flaviof > >> > >> > >> > >> > >> > Signed-off-by: Gurucharan Shetty <g...@ovn.org> > >> > > > > Acked-by: Flavio Fernandes <fla...@flaviof.com> > > Thank you for taking a look. I applied this. We will fix issues that come up in real world testing.
> > > > > > > > --- > >> > ovn/lib/actions.c | 83 ++++++++++++++++++++ > >> > ovn/northd/ovn-northd.8.xml | 131 ++++++++++++++++++++++++++++--- > >> > ovn/northd/ovn-northd.c | 187 > >> > ++++++++++++++++++++++++++++++++++++++++++-- > >> > ovn/ovn-nb.ovsschema | 19 ++++- > >> > ovn/ovn-nb.xml | 65 +++++++++++++-- > >> > ovn/ovn-sb.xml | 41 ++++++++++ > >> > ovn/utilities/ovn-nbctl.c | 5 ++ > >> > tests/ovn.at | 17 ++++ > >> > 8 files changed, 524 insertions(+), 24 deletions(-) > >> > > >> > diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c > >> > index 5f0bf19..4a486a0 100644 > >> > --- a/ovn/lib/actions.c > >> > +++ b/ovn/lib/actions.c > >> > @@ -442,6 +442,85 @@ emit_ct(struct action_context *ctx, bool > >> recirc_next, > >> > bool commit) > >> > add_prerequisite(ctx, "ip"); > >> > } > >> > > >> > +static void > >> > +parse_ct_nat(struct action_context *ctx, bool snat) > >> > +{ > >> > + const size_t ct_offset = ctx->ofpacts->size; > >> > + ofpbuf_pull(ctx->ofpacts, ct_offset); > >> > + > >> > + struct ofpact_conntrack *ct = ofpact_put_CT(ctx->ofpacts); > >> > + > >> > + if (ctx->ap->cur_ltable < ctx->ap->n_tables) { > >> > + ct->recirc_table = ctx->ap->first_ptable + > ctx->ap->cur_ltable > >> + > >> > 1; > >> > + } else { > >> > + action_error(ctx, > >> > + "\"ct_[sd]nat\" action not allowed in last > >> table."); > >> > + return; > >> > + } > >> > + > >> > + if (snat) { > >> > + ct->zone_src.field = mf_from_id(MFF_LOG_SNAT_ZONE); > >> > + } else { > >> > + ct->zone_src.field = mf_from_id(MFF_LOG_DNAT_ZONE); > >> > + } > >> > + ct->zone_src.ofs = 0; > >> > + ct->zone_src.n_bits = 16; > >> > + ct->flags = 0; > >> > + ct->alg = 0; > >> > + > >> > + add_prerequisite(ctx, "ip"); > >> > + > >> > + struct ofpact_nat *nat; > >> > + size_t nat_offset; > >> > + nat_offset = ctx->ofpacts->size; > >> > + ofpbuf_pull(ctx->ofpacts, nat_offset); > >> > + > >> > + nat = ofpact_put_NAT(ctx->ofpacts); > >> > + nat->flags = 0; > >> > + nat->range_af = AF_UNSPEC; > >> > + > >> > + int commit = 0; > >> > + if (lexer_match(ctx->lexer, LEX_T_LPAREN)) { > >> > + ovs_be32 ip; > >> > + if (ctx->lexer->token.type == LEX_T_INTEGER > >> > + && ctx->lexer->token.format == LEX_F_IPV4) { > >> > + ip = ctx->lexer->token.value.ipv4; > >> > + } else { > >> > + action_syntax_error(ctx, "invalid ip"); > >> > + return; > >> > + } > >> > + > >> > + nat->range_af = AF_INET; > >> > + nat->range.addr.ipv4.min = ip; > >> > + if (snat) { > >> > + nat->flags |= NX_NAT_F_SRC; > >> > + } else { > >> > + nat->flags |= NX_NAT_F_DST; > >> > + } > >> > + commit = NX_CT_F_COMMIT; > >> > + lexer_get(ctx->lexer); > >> > + if (!lexer_match(ctx->lexer, LEX_T_RPAREN)) { > >> > + action_syntax_error(ctx, "expecting `)'"); > >> > + return; > >> > + } > >> > + } > >> > + > >> > + ctx->ofpacts->header = ofpbuf_push_uninit(ctx->ofpacts, > >> nat_offset); > >> > + ct = ctx->ofpacts->header; > >> > + ct->flags |= commit; > >> > + > >> > + /* XXX: For performance reasons, we try to prevent additional > >> > + * recirculations. So far, ct_snat which is used in a gateway > >> router > >> > + * does not need a recirculation. ct_snat(IP) does need a > >> > recirculation. > >> > + * Should we consider a method to let the actions specify > whether a > >> > action > >> > + * needs recirculation if there more use cases?. */ > >> > + if (!commit && snat) { > >> > + ct->recirc_table = NX_CT_RECIRC_NONE; > >> > + } > >> > + ofpact_finish(ctx->ofpacts, &ct->ofpact); > >> > + ofpbuf_push_uninit(ctx->ofpacts, ct_offset); > >> > +} > >> > + > >> > static bool > >> > parse_action(struct action_context *ctx) > >> > { > >> > @@ -469,6 +548,10 @@ parse_action(struct action_context *ctx) > >> > emit_ct(ctx, true, false); > >> > } else if (lexer_match_id(ctx->lexer, "ct_commit")) { > >> > emit_ct(ctx, false, true); > >> > + } else if (lexer_match_id(ctx->lexer, "ct_dnat")) { > >> > + parse_ct_nat(ctx, false); > >> > + } else if (lexer_match_id(ctx->lexer, "ct_snat")) { > >> > + parse_ct_nat(ctx, true); > >> > } else if (lexer_match_id(ctx->lexer, "arp")) { > >> > parse_arp_action(ctx); > >> > } else if (lexer_match_id(ctx->lexer, "get_arp")) { > >> > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml > >> > index 1983812..c237604 100644 > >> > --- a/ovn/northd/ovn-northd.8.xml > >> > +++ b/ovn/northd/ovn-northd.8.xml > >> > @@ -517,11 +517,40 @@ next; > >> > > >> > <li> > >> > <p> > >> > - Reply to ARP requests. These flows reply to ARP requests > for > >> > the > >> > - router's own IP address. For each router port <var>P</var> > >> > that owns > >> > - IP address <var>A</var> and Ethernet address <var>E</var>, > a > >> > - priority-90 flow matches <code>inport == <var>P</var> > >> && > >> > - arp.op == 1 && arp.tpa == <var>A</var></code> (ARP > >> > request) > >> > + Reply to ARP requests. > >> > + </p> > >> > + > >> > + <p> > >> > + These flows reply to ARP requests for the router's own IP > >> > address. > >> > + For each router port <var>P</var> that owns IP address > >> > <var>A</var> > >> > + and Ethernet address <var>E</var>, a priority-90 flow > matches > >> > + <code>inport == <var>P</var> && arp.op == 1 > >> && > >> > + arp.tpa == <var>A</var></code> (ARP request) with the > >> following > >> > + actions: > >> > + </p> > >> > + > >> > + <pre> > >> > +eth.dst = eth.src; > >> > +eth.src = <var>E</var>; > >> > +arp.op = 2; /* ARP reply. */ > >> > +arp.tha = arp.sha; > >> > +arp.sha = <var>E</var>; > >> > +arp.tpa = arp.spa; > >> > +arp.spa = <var>A</var>; > >> > +outport = <var>P</var>; > >> > +inport = ""; /* Allow sending out inport. */ > >> > +output; > >> > + </pre> > >> > + </li> > >> > + > >> > + <li> > >> > + <p> > >> > + These flows reply to ARP requests for the virtual IP > >> addresses > >> > + configured in the router for DNAT. For a configured DNAT IP > >> > address > >> > + <var>A</var>, for each router port <var>P</var> with > Ethernet > >> > + address <var>E</var>, a priority-90 flow matches > >> > + <code>inport == <var>P</var> && arp.op == 1 > >> && > >> > + arp.tpa == <var>A</var></code> (ARP request) > >> > with the following actions: > >> > </p> > >> > > >> > @@ -663,7 +692,62 @@ icmp4 { > >> > </li> > >> > </ul> > >> > > >> > - <h3>Ingress Table 2: IP Routing</h3> > >> > + <h3>Ingress Table 2: UNSNAT</h3> > >> > + > >> > + <p> > >> > + This is for already established connections' reverse traffic. > >> > + i.e., SNAT has already been done in egress pipeline and now the > >> > + packet has entered the ingress pipeline as part of a reply. It > >> is > >> > + unSNATted here. > >> > + </p> > >> > + > >> > + <ul> > >> > + <li> > >> > + <p> > >> > + For each configuration in the OVN Northbound database, that > >> asks > >> > + to change the source IP address of a packet from > >> <var>A</var> to > >> > + <var>B</var>, a priority-100 flow matches <code>ip > && > >> > + ip4.dst == <var>B</var></code> with an action > >> > + <code>ct_snat; next;</code>. > >> > + </p> > >> > + > >> > + <p> > >> > + A priority-0 logical flow with match <code>1</code> has > >> actions > >> > + <code>next;</code>. > >> > + </p> > >> > + </li> > >> > + </ul> > >> > + > >> > + <h3>Ingress Table 3: DNAT</h3> > >> > + > >> > + <p> > >> > + Packets enter the pipeline with destination IP address that > >> needs to > >> > + be DNATted from a virtual IP address to a real IP address. > >> Packets > >> > + in the reverse direction needs to be unDNATed. > >> > + </p> > >> > + <ul> > >> > + <li> > >> > + <p> > >> > + For each configuration in the OVN Northbound database, that > >> asks > >> > + to change the destination IP address of a packet from > >> > <var>A</var> to > >> > + <var>B</var>, a priority-100 flow matches <code>ip > && > >> > + ip4.dst == <var>A</var></code> with an action <code>inport > = > >> ""; > >> > + ct_dnat(<var>B</var>);</code>. > >> > + </p> > >> > + > >> > + <p> > >> > + For all IP packets of a Gateway router, a priority-50 flow > >> with > >> > an > >> > + action <code>inport = ""; ct_dnat;</code>. > >> > + </p> > >> > + > >> > + <p> > >> > + A priority-0 logical flow with match <code>1</code> has > >> actions > >> > + <code>next;</code>. > >> > + </p> > >> > + </li> > >> > + </ul> > >> > + > >> > + <h3>Ingress Table 4: IP Routing</h3> > >> > > >> > <p> > >> > A packet that arrives at this table is an IP packet that should > >> be > >> > routed > >> > @@ -672,7 +756,7 @@ icmp4 { > >> > <code>ip4.dst</code>, the packet's final destination, > unchanged) > >> and > >> > advances to the next table for ARP resolution. It also sets > >> > <code>reg1</code> to the IP address owned by the selected > router > >> > port > >> > - (which is used later in table 4 as the IP source address for an > >> ARP > >> > + (which is used later in table 6 as the IP source address for an > >> ARP > >> > request, if needed). > >> > </p> > >> > > >> > @@ -743,7 +827,7 @@ icmp4 { > >> > </li> > >> > </ul> > >> > > >> > - <h3>Ingress Table 3: ARP Resolution</h3> > >> > + <h3>Ingress Table 5: ARP Resolution</h3> > >> > > >> > <p> > >> > Any packet that reaches this table is an IP packet whose > >> next-hop IP > >> > @@ -798,7 +882,7 @@ icmp4 { > >> > </li> > >> > </ul> > >> > > >> > - <h3>Ingress Table 4: ARP Request</h3> > >> > + <h3>Ingress Table 6: ARP Request</h3> > >> > > >> > <p> > >> > In the common case where the Ethernet destination has been > >> > resolved, this > >> > @@ -823,7 +907,7 @@ arp { > >> > </pre> > >> > > >> > <p> > >> > - (Ingress table 2 initialized <code>reg1</code> with the IP > >> > address > >> > + (Ingress table 4 initialized <code>reg1</code> with the IP > >> > address > >> > owned by <code>outport</code>.) > >> > </p> > >> > > >> > @@ -838,7 +922,32 @@ arp { > >> > </li> > >> > </ul> > >> > > >> > - <h3>Egress Table 0: Delivery</h3> > >> > + <h3>Egress Table 0: SNAT</h3> > >> > + > >> > + <p> > >> > + Packets that are configured to be SNATed get their source IP > >> address > >> > + changed based on the configuration in the OVN Northbound > >> database. > >> > + </p> > >> > + <ul> > >> > + <li> > >> > + <p> > >> > + For each configuration in the OVN Northbound database, that > >> asks > >> > + to change the source IP address of a packet from an IP > >> address > >> > of > >> > + <var>A</var> or to change the source IP address of a packet > >> that > >> > + belongs to network <var>A</var> to <var>B</var>, a flow > >> matches > >> > + <code>ip && ip4.src == <var>A</var></code> with an > >> > action > >> > + <code>ct_snat(<var>B</var>);</code>. The priority of the > >> flow > >> > + is calculated based on the mask of <var>A</var>, with > matches > >> > + having larger masks getting higher priorities. > >> > + </p> > >> > + <p> > >> > + A priority-0 logical flow with match <code>1</code> has > >> actions > >> > + <code>next;</code>. > >> > + </p> > >> > + </li> > >> > + </ul> > >> > + > >> > + <h3>Egress Table 1: Delivery</h3> > >> > > >> > <p> > >> > Packets that reach this table are ready for delivery. It > >> contains > >> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > >> > index cac0148..4683780 100644 > >> > --- a/ovn/northd/ovn-northd.c > >> > +++ b/ovn/northd/ovn-northd.c > >> > @@ -105,12 +105,15 @@ enum ovn_stage { > >> > /* Logical router ingress stages. */ > \ > >> > PIPELINE_STAGE(ROUTER, IN, ADMISSION, 0, "lr_in_admission") > \ > >> > PIPELINE_STAGE(ROUTER, IN, IP_INPUT, 1, "lr_in_ip_input") > \ > >> > - PIPELINE_STAGE(ROUTER, IN, IP_ROUTING, 2, "lr_in_ip_routing") > \ > >> > - PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 3, > "lr_in_arp_resolve") \ > >> > - PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 4, > "lr_in_arp_request") \ > >> > + PIPELINE_STAGE(ROUTER, IN, UNSNAT, 2, "lr_in_unsnat") > \ > >> > + PIPELINE_STAGE(ROUTER, IN, DNAT, 3, "lr_in_dnat") > \ > >> > + PIPELINE_STAGE(ROUTER, IN, IP_ROUTING, 4, "lr_in_ip_routing") > \ > >> > + PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 5, > "lr_in_arp_resolve") \ > >> > + PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 6, > "lr_in_arp_request") \ > >> > > \ > >> > /* Logical router egress stages. */ > \ > >> > - PIPELINE_STAGE(ROUTER, OUT, DELIVERY, 0, "lr_out_delivery") > >> > + PIPELINE_STAGE(ROUTER, OUT, SNAT, 0, "lr_out_snat") > \ > >> > + PIPELINE_STAGE(ROUTER, OUT, DELIVERY, 1, "lr_out_delivery") > >> > > >> > #define PIPELINE_STAGE(DP_TYPE, PIPELINE, STAGE, TABLE, NAME) \ > >> > S_##DP_TYPE##_##PIPELINE##_##STAGE \ > >> > @@ -1998,6 +2001,51 @@ build_lrouter_flows(struct hmap *datapaths, > >> struct > >> > hmap *ports, > >> > free(match); > >> > free(actions); > >> > > >> > + /* ARP handling for external IP addresses. > >> > + * > >> > + * DNAT IP addresses are external IP addresses that need ARP > >> > + * handling. */ > >> > + for (int i = 0; i < op->od->nbr->n_nat; i++) { > >> > + const struct nbrec_nat *nat; > >> > + > >> > + nat = op->od->nbr->nat[i]; > >> > + > >> > + if(!strcmp(nat->type, "snat")) { > >> > + continue; > >> > + } > >> > + > >> > + ovs_be32 ip; > >> > + if (!ip_parse(nat->external_ip, &ip) || !ip) { > >> > + static struct vlog_rate_limit rl = > >> > VLOG_RATE_LIMIT_INIT(5, 1); > >> > + VLOG_WARN_RL(&rl, "bad ip address %s in dnat > >> > configuration " > >> > + "for router %s", nat->external_ip, > >> op->key); > >> > + continue; > >> > + } > >> > + > >> > + match = xasprintf( > >> > + "inport == %s && arp.tpa == "IP_FMT" && arp.op == 1", > >> > + op->json_key, IP_ARGS(ip)); > >> > + actions = xasprintf( > >> > + "eth.dst = eth.src; " > >> > + "eth.src = "ETH_ADDR_FMT"; " > >> > + "arp.op = 2; /* ARP reply */ " > >> > + "arp.tha = arp.sha; " > >> > + "arp.sha = "ETH_ADDR_FMT"; " > >> > + "arp.tpa = arp.spa; " > >> > + "arp.spa = "IP_FMT"; " > >> > + "outport = %s; " > >> > + "inport = \"\"; /* Allow sending out inport. */ " > >> > + "output;", > >> > + ETH_ADDR_ARGS(op->mac), > >> > + ETH_ADDR_ARGS(op->mac), > >> > + IP_ARGS(ip), > >> > + op->json_key); > >> > + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90, > >> > + match, actions); > >> > + free(match); > >> > + free(actions); > >> > + } > >> > + > >> > /* Drop IP traffic to this router. */ > >> > match = xasprintf("ip4.dst == "IP_FMT, IP_ARGS(op->ip)); > >> > ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 60, > >> > @@ -2005,6 +2053,135 @@ build_lrouter_flows(struct hmap *datapaths, > >> struct > >> > hmap *ports, > >> > free(match); > >> > } > >> > > >> > + /* NAT in Gateway routers. */ > >> > + HMAP_FOR_EACH (od, key_node, datapaths) { > >> > + if (!od->nbr) { > >> > + continue; > >> > + } > >> > + > >> > + /* Packets are allowed by default. */ > >> > + ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1", > "next;"); > >> > + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", > "next;"); > >> > + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;"); > >> > + > >> > + /* NAT rules are only valid on Gateway routers. */ > >> > + if (!smap_get(&od->nbr->options, "chassis")) { > >> > + continue; > >> > + } > >> > + > >> > + for (int i = 0; i < od->nbr->n_nat; i++) { > >> > + const struct nbrec_nat *nat; > >> > + > >> > + nat = od->nbr->nat[i]; > >> > + > >> > + ovs_be32 ip, mask; > >> > + > >> > + char *error = ip_parse_masked(nat->external_ip, &ip, > >> &mask); > >> > + if (error || mask != OVS_BE32_MAX) { > >> > + static struct vlog_rate_limit rl = > >> > VLOG_RATE_LIMIT_INIT(5, 1); > >> > + VLOG_WARN_RL(&rl, "bad external ip %s for nat", > >> > + nat->external_ip); > >> > + free(error); > >> > + continue; > >> > + } > >> > + > >> > + /* Check the validity of nat->logical_ip. 'logical_ip' > can > >> > + * be a subnet when the type is "snat". */ > >> > + error = ip_parse_masked(nat->logical_ip, &ip, &mask); > >> > + if (!strcmp(nat->type, "snat")) { > >> > + if (error) { > >> > + static struct vlog_rate_limit rl = > >> > + VLOG_RATE_LIMIT_INIT(5, 1); > >> > + VLOG_WARN_RL(&rl, "bad ip network or ip %s for > >> snat " > >> > + "in router "UUID_FMT"", > >> > + nat->logical_ip, > UUID_ARGS(&od->key)); > >> > + free(error); > >> > + continue; > >> > + } > >> > + } else { > >> > + if (error || mask != OVS_BE32_MAX) { > >> > + static struct vlog_rate_limit rl = > >> > + VLOG_RATE_LIMIT_INIT(5, 1); > >> > + VLOG_WARN_RL(&rl, "bad ip %s for dnat in router " > >> > + ""UUID_FMT"", nat->logical_ip, > >> > UUID_ARGS(&od->key)); > >> > + free(error); > >> > + continue; > >> > + } > >> > + } > >> > + > >> > + > >> > + char *match, *actions; > >> > + > >> > + /* Ingress UNSNAT table: It is for already established > >> > connections' > >> > + * reverse traffic. i.e., SNAT has already been done in > >> egress > >> > + * pipeline and now the packet has entered the ingress > >> > pipeline as > >> > + * part of a reply. We undo the SNAT here. > >> > + * > >> > + * Undoing SNAT has to happen before DNAT processing. > >> This is > >> > + * because when the packet was DNATed in ingress > pipeline, > >> it > >> > did > >> > + * not know about the possibility of eventual additional > >> SNAT > >> > in > >> > + * egress pipeline. */ > >> > + if (!strcmp(nat->type, "snat") > >> > + || !strcmp(nat->type, "dnat_and_snat")) { > >> > + match = xasprintf("ip && ip4.dst == %s", > >> > nat->external_ip); > >> > + ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 100, > >> > + match, "ct_snat; next;"); > >> > + free(match); > >> > + } > >> > + > >> > + /* Ingress DNAT table: Packets enter the pipeline with > >> > destination > >> > + * IP address that needs to be DNATted from a external IP > >> > address > >> > + * to a logical IP address. */ > >> > + if (!strcmp(nat->type, "dnat") > >> > + || !strcmp(nat->type, "dnat_and_snat")) { > >> > + /* Packet when it goes from the initiator to > >> destination. > >> > + * We need to zero the inport because the router can > >> > + * send the packet back through the same interface. > */ > >> > + match = xasprintf("ip && ip4.dst == %s", > >> > nat->external_ip); > >> > + actions = xasprintf("inport = \"\"; ct_dnat(%s);", > >> > + nat->logical_ip); > >> > + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100, > >> > + match, actions); > >> > + free(match); > >> > + free(actions); > >> > + } > >> > + > >> > + /* Egress SNAT table: Packets enter the egress pipeline > >> with > >> > + * source ip address that needs to be SNATted to a > >> external ip > >> > + * address. */ > >> > + if (!strcmp(nat->type, "snat") > >> > + || !strcmp(nat->type, "dnat_and_snat")) { > >> > + match = xasprintf("ip && ip4.src == %s", > >> nat->logical_ip); > >> > + actions = xasprintf("ct_snat(%s);", > nat->external_ip); > >> > + > >> > + /* The priority here is calculated such that the > >> > + * nat->logical_ip with the longest mask gets a > higher > >> > + * priority. */ > >> > + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, > >> > + count_1bits(ntohl(mask)) + 1, match, > >> > actions); > >> > + free(match); > >> > + free(actions); > >> > + } > >> > + } > >> > + > >> > + /* Re-circulate every packet through the DNAT zone. > >> > + * This helps with two things. > >> > + * > >> > + * 1. Any packet that needs to be unDNATed in the reverse > >> > + * direction gets unDNATed. Ideally this could be done in > >> > + * the egress pipeline. But since the gateway router > >> > + * does not have any feature that depends on the source > >> > + * ip address being external IP address for IP routing, > >> > + * we can do it here, saving a future re-circulation. > >> > + * > >> > + * 2. Any packet that was sent through SNAT zone in the > >> > + * previous table automatically gets re-circulated to get > >> > + * back the new destination IP address that is needed for > >> > + * routing in the openflow pipeline. */ > >> > + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50, > >> > + "ip", "inport = \"\"; ct_dnat;"); > >> > + } > >> > + > >> > /* Logical router ingress table 2: IP Routing. > >> > * > >> > * A packet that arrives at this table is an IP packet that > should > >> be > >> > @@ -2205,7 +2382,7 @@ build_lrouter_flows(struct hmap *datapaths, > struct > >> > hmap *ports, > >> > ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_REQUEST, 0, "1", > >> > "output;"); > >> > } > >> > > >> > - /* Logical router egress table 0: Delivery (priority 100). > >> > + /* Logical router egress table 1: Delivery (priority 100). > >> > * > >> > * Priority 100 rules deliver packets to enabled logical ports. > */ > >> > HMAP_FOR_EACH (op, key_node, ports) { > >> > diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema > >> > index fa21b30..ac6ca14 100644 > >> > --- a/ovn/ovn-nb.ovsschema > >> > +++ b/ovn/ovn-nb.ovsschema > >> > @@ -1,7 +1,7 @@ > >> > { > >> > "name": "OVN_Northbound", > >> > - "version": "2.1.2", > >> > - "cksum": "429668869 5325", > >> > + "version": "2.1.3", > >> > + "cksum": "3631923697 6121", > >> > "tables": { > >> > "Logical_Switch": { > >> > "columns": { > >> > @@ -78,6 +78,11 @@ > >> > "max": "unlimited"}}, > >> > "default_gw": {"type": {"key": "string", "min": 0, > >> "max": > >> > 1}}, > >> > "enabled": {"type": {"key": "boolean", "min": 0, > "max": > >> > 1}}, > >> > + "nat": {"type": {"key": {"type": "uuid", > >> > + "refTable": "NAT", > >> > + "refType": "strong"}, > >> > + "min": 0, > >> > + "max": "unlimited"}}, > >> > "options": { > >> > "type": {"key": "string", > >> > "value": "string", > >> > @@ -104,6 +109,16 @@ > >> > "ip_prefix": {"type": "string"}, > >> > "nexthop": {"type": "string"}, > >> > "output_port": {"type": {"key": "string", "min": 0, > >> > "max": 1}}}, > >> > + "isRoot": false}, > >> > + "NAT": { > >> > + "columns": { > >> > + "external_ip": {"type": "string"}, > >> > + "logical_ip": {"type": "string"}, > >> > + "type": {"type": {"key": {"type": "string", > >> > + "enum": ["set", ["dnat", > >> > + "snat", > >> > + > >> > "dnat_and_snat" > >> > + > ]]}}}}, > >> > "isRoot": false} > >> > } > >> > } > >> > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml > >> > index 130b63b..36d1158 100644 > >> > --- a/ovn/ovn-nb.xml > >> > +++ b/ovn/ovn-nb.xml > >> > @@ -631,18 +631,31 @@ > >> > router has all ingress and egress traffic dropped. > >> > </column> > >> > > >> > + <column name="nat"> > >> > + One or more NAT rules for the router. NAT rules only work on > the > >> > + Gateway routers. > >> > + </column> > >> > + > >> > <group title="Options"> > >> > <p> > >> > Additional options for the logical router. > >> > </p> > >> > > >> > <column name="options" key="chassis"> > >> > - If set, indicates that the logical router in question is > >> > - a Gateway router (which is centralized) and resides in the > set > >> > - chassis. The same value is also used by > >> > <code>ovn-controller</code> > >> > - to uniquely identify the chassis in the OVN deployment and > >> > - comes from <code>external_ids:system-id</code> in the > >> > - <code>Open_vSwitch</code> table of Open_vSwitch database. > >> > + <p> > >> > + If set, indicates that the logical router in question is a > >> > Gateway > >> > + router (which is centralized) and resides in the set > chassis. > >> > The > >> > + same value is also used by <code>ovn-controller</code> to > >> > + uniquely identify the chassis in the OVN deployment and > >> > + comes from <code>external_ids:system-id</code> in the > >> > + <code>Open_vSwitch</code> table of Open_vSwitch database. > >> > + </p> > >> > + > >> > + <p> > >> > + The Gateway router can only be connected to a distributed > >> router > >> > + via a switch if SNAT and DNAT are to be configured in the > >> > Gateway > >> > + router. > >> > + </p> > >> > </column> > >> > </group> > >> > > >> > @@ -765,4 +778,44 @@ > >> > </column> > >> > </table> > >> > > >> > + <table name="NAT" title="NAT rules for a Gateway router."> > >> > + <p> > >> > + Each record represents a NAT rule in a Gateway router. > >> > + </p> > >> > + > >> > + <column name="type"> > >> > + <p>Type of the NAT rule.</p> > >> > + <ul> > >> > + <li> > >> > + When <ref column="type"/> is <code>dnat</code>, the > >> externally > >> > + visible IP address <ref column="external_ip"/> is DNATted > to > >> > the IP > >> > + address <ref column="logical_ip"/> in the logical space. > >> > + </li> > >> > + <li> > >> > + When <ref column="type"/> is <code>snat</code>, IP packets > >> > + with their source IP address that either matches the IP > >> address > >> > + in <ref column="logical_ip"/> or is in the network provided > >> by > >> > + <ref column="logical_ip"/> is SNATed into the IP address in > >> > + <ref column="external_ip"/>. > >> > + </li> > >> > + <li> > >> > + When <ref column="type"/> is <code>dnat_and_snat</code>, > the > >> > + externally visible IP address <ref column="external_ip"/> > is > >> > + DNATted to the IP address <ref column="logical_ip"/> in the > >> > + logical space. In addition, IP packets with the source IP > >> > + address that matches <ref column="logical_ip"/> is SNATed > >> into > >> > + the IP address in <ref column="external_ip"/>. > >> > + </li> > >> > + </ul> > >> > + </column> > >> > + > >> > + <column name="external_ip"> > >> > + An IPv4 address. > >> > + </column> > >> > + > >> > + <column name="logical_ip"> > >> > + An IPv4 network (e.g 192.168.1.0/24) or an IPv4 address. > >> > + </column> > >> > + </table> > >> > + > >> > </database> > >> > diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml > >> > index 1231b4e..5665871 100644 > >> > --- a/ovn/ovn-sb.xml > >> > +++ b/ovn/ovn-sb.xml > >> > @@ -951,6 +951,47 @@ > >> > </p> > >> > </dd> > >> > > >> > + <dt><code>ct_dnat;</code></dt> > >> > + <dt><code>ct_dnat(<var>IP</var>);</code></dt> > >> > + <dd> > >> > + <p> > >> > + <code>ct_dnat</code> sends the packet through the DNAT > >> zone in > >> > + connection tracking table to unDNAT any packet that was > >> > DNATed in > >> > + the opposite direction. The packet is then automatically > >> > sent to > >> > + to the next tables as if followed by <code>next;</code> > >> > action. > >> > + The next tables will see the changes in the packet caused > >> by > >> > + the connection tracker. > >> > + </p> > >> > + <p> > >> > + <code>ct_dnat(<var>IP</var>)</code> sends the packet > >> through > >> > the > >> > + DNAT zone to change the destination IP address of the > >> packet > >> > to > >> > + the one provided inside the parenthesis and commits the > >> > connection. > >> > + The packet is then automatically sent to the next tables > >> as if > >> > + followed by <code>next;</code> action. The next tables > >> will > >> > see > >> > + the changes in the packet caused by the connection > tracker. > >> > + </p> > >> > + </dd> > >> > + > >> > + <dt><code>ct_snat;</code></dt> > >> > + <dt><code>ct_snat(<var>IP</var>);</code></dt> > >> > + <dd> > >> > + <p> > >> > + <code>ct_snat</code> sends the packet through the SNAT > >> zone to > >> > + unSNAT any packet that was SNATed in the opposite > >> direction. > >> > If > >> > + the packet needs to be sent to the next tables, then it > >> > should be > >> > + followed by a <code>next;</code> action. The next tables > >> > will not > >> > + see the changes in the packet caused by the connection > >> > tracker. > >> > + </p> > >> > + <p> > >> > + <code>ct_snat(<var>IP</var>)</code> sends the packet > >> through > >> > the > >> > + SNAT zone to change the source IP address of the packet > to > >> > + the one provided inside the parenthesis and commits the > >> > connection. > >> > + The packet is then automatically sent to the next tables > >> as if > >> > + followed by <code>next;</code> action. The next tables > >> will > >> > see the > >> > + changes in the packet caused by the connection tracker. > >> > + </p> > >> > + </dd> > >> > + > >> > <dt><code>arp { <var>action</var>; </code>...<code> > >> };</code></dt> > >> > <dd> > >> > <p> > >> > diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c > >> > index 321040e..b821307 100644 > >> > --- a/ovn/utilities/ovn-nbctl.c > >> > +++ b/ovn/utilities/ovn-nbctl.c > >> > @@ -1449,6 +1449,11 @@ static const struct ctl_table_class tables[] = > { > >> > NULL}, > >> > {NULL, NULL, NULL}}}, > >> > > >> > + {&nbrec_table_nat, > >> > + {{&nbrec_table_nat, NULL, > >> > + NULL}, > >> > + {NULL, NULL, NULL}}}, > >> > + > >> > {NULL, {{NULL, NULL, NULL}, {NULL, NULL, NULL}}} > >> > }; > >> > > >> > diff --git a/tests/ovn.at b/tests/ovn.at > >> > index 633cf35..19d5c73 100644 > >> > --- a/tests/ovn.at > >> > +++ b/tests/ovn.at > >> > @@ -507,6 +507,23 @@ ip.ttl => Syntax error at end of input expecting > >> `--'. > >> > ct_next; => actions=ct(table=27,zone=NXM_NX_REG5[0..15]), prereqs=ip > >> > ct_commit; => actions=ct(commit,zone=NXM_NX_REG5[0..15]), prereqs=ip > >> > > >> > +# dnat > >> > +ct_dnat; => actions=ct(table=27,zone=NXM_NX_REG3[0..15],nat), > >> prereqs=ip > >> > +ct_dnat(192.168.1.2); => > >> > > >> > actions=ct(commit,table=27,zone=NXM_NX_REG3[0..15],nat(dst=192.168.1.2)), > >> > prereqs=ip > >> > +ct_dnat(192.168.1.2, 192.168.1.3); => Syntax error at `,' expecting > >> `)'. > >> > +ct_dnat(foo); => Syntax error at `foo' invalid ip. > >> > +ct_dnat(foo, bar); => Syntax error at `foo' invalid ip. > >> > +ct_dnat(); => Syntax error at `)' invalid ip. > >> > + > >> > +# snat > >> > +ct_snat; => actions=ct(zone=NXM_NX_REG4[0..15],nat), prereqs=ip > >> > +ct_snat(192.168.1.2); => > >> > > >> > actions=ct(commit,table=27,zone=NXM_NX_REG4[0..15],nat(src=192.168.1.2)), > >> > prereqs=ip > >> > +ct_snat(192.168.1.2, 192.168.1.3); => Syntax error at `,' expecting > >> `)'. > >> > +ct_snat(foo); => Syntax error at `foo' invalid ip. > >> > +ct_snat(foo, bar); => Syntax error at `foo' invalid ip. > >> > +ct_snat(); => Syntax error at `)' invalid ip. > >> > + > >> > + > >> > # arp > >> > arp { eth.dst = ff:ff:ff:ff:ff:ff; output; }; => > >> > > >> > actions=controller(userdata=00.00.00.00.00.00.00.00.00.19.00.10.80.00.06.06.ff.ff.ff.ff.ff.ff.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.40.00.00.00), > >> > prereqs=ip4 > >> > > >> > -- > >> > 1.9.1 > >> > > >> > > >> _______________________________________________ > >> dev mailing list > >> dev@openvswitch.org > >> http://openvswitch.org/mailman/listinfo/dev > >> > > > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev