On 21 June 2016 at 10:29, Flaviof <fla...@flaviof.com> wrote:

> On Tue, Jun 21, 2016 at 10:46 AM, Guru Shetty <g...@ovn.org> wrote:
>
> >
> >
> > On 20 June 2016 at 19:36, Flaviof <fla...@flaviof.com> wrote:
> >
> >> On Mon, Jun 13, 2016 at 6:45 AM, Gurucharan Shetty <g...@ovn.org>
> wrote:
> >>
> >> > For traffic from physical space to virtual space we need DNAT.
> >> > The DNAT happens in the gateway router and reaches the logical
> >> > port. The return traffic should be unDNATed.
> >> >
> >> > Traffic originating in virtual space heading to physical space
> >> > should be SNATed. The return traffic is unSNATted.
> >> >
> >> > East-west traffic with the public destination IP address needs
> >> > a DNAT. This traffic is punted to the l3 gateway where DNAT
> >> > takes place. This traffic is also SNATed and eventually loops back to
> >> > its destination. The SNAT is needed because we need the reverse
> traffic
> >> > to go back to the l3 gateway and not short-circuit directly to the
> >> source.
> >> >
> >> > This commit introduces 4 new logical actions.
> >> > 1. ct_snat: To send the packet through SNAT zone to unSNAT packets.
> >> > 2. ct_snat(IP): To SNAT to the provided IP address.
> >> > 3. ct_dnat: To send the packet throgh DNAT zone to unDNAT packets.
> >> > 4. ct_dnat(IP): To DNAT to the provided IP.
> >> >
> >> > This commit only provides the ability to do IP based NAT. This will
> >> > eventually be enhanced to do PORT based NAT too.
> >> >
> >> > Command hints:
> >> >
> >> > Consider a distributed router "R1" that has switch foo (
> 192.168.1.0/24)
> >> > with a lport foo1 (192.168.1.2) and bar (192.168.2.0/24) with lport
> >> bar1
> >> > (192.168.2.2) connected to it. You connect "R1" to
> >> > a gateway router "R2" via a switch "join" in (20.0.0.0/24) network.
> >> >
> >> > R2 has a switch "alice" (172.16.1.0/24) connected to it (to simulate
> >> > external network).
> >> >
> >> > case: Add pure DNAT (north-south)
> >> >
> >> > Add a DNAT rule in R2:
> >> > ovn-nbctl -- --id=@nat create nat type="dnat" logical_ip=192.168.1.2 \
> >> > external_ip=30.0.0.2 -- add logical_router R2 nat @nat
> >> >
> >> > Now alice1 should be able to ping 192.168.1.2 via 30.0.0.2.
> >> >
> >> > case2 : Add pure SNAT (south-north)
> >> >
> >> > Add a SNAT rule in R2:
> >> >
> >> > ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \
> >> > external_ip=30.0.0.1 -- add logical_router R2 nat @nat
> >> >
> >> > (You need a static route in R1 to send packets destined to outside
> >> > world to go through R2. The logical_ip can be a subnet.)
> >> >
> >> > When bar1 pings alice1, alice1 receives traffic from 30.0.0.1
> >> >
> >> > case3 : SNAT and DNAT (east-west traffic)
> >> >
> >> > When bar1 pings 30.0.0.2, the traffic jumps to the gateway router
> >> > and loops back to foo1 with a source ip address of 30.0.0.1
> >> >
> >> >
> >> So, is 30.0.0.0/x network an external network that R2 has a port too?
> >>
> >
> > The example above does not have that. In the above example 30.0.0.0/x is
> > being treated as virtual address. But in a real setup (non-simulated),
> you
> > are right. R2 will be connected to a 30.0.0.0/x network and will have a
> > port in it. It will also have a static route (0.0.0.0/0) or a
> > default_gateway to point to the physical router IP address as its next
> hop.
> > (I have not tested it as I do not have a real setup at hand, but based on
> > the simulation, it should ideally work.)
> >
> >
> >> What is the next hop that R2 would use to reach a destination beyond
> >> that subnet?
> >>
> > Answered above.
> >
>
> Ack!
>
>
> >
> >>
> >> I think this may be clear when a test is added to ovn.at, which uses
> foo,
> >> bar, join, alice
> >>
> > The unit tests do not have the ability to do conntrack NAT right now. I
> > think we should add one once Daniele introduces NAT to usespace
> conntrack.
> > But the unit test "ovn -- 2 HVs, 2 LRs connected via LS, gateway router"
> > does something very similar (it has foo - R1 - join - R2 - alice).
> >
>
> Right, I saw that test and it makes perfect sense. Adding the 'bar' logical
> switch, net 30.0.0.x and the nat rules are the few lines that it currently
> does not have.
>
>
> >
> >>
> >> Based on the code and my little test setup, there seems to be a high
> cost
> >> for DNAT entries in that an ARP response rule will be added per DNAT x
> all
> >> router ports.
> >
> > The intention was to add only on the router where DNAT entry is defined
> > and not on all router ports of all routers. Is it not true? (If so, this
> is
> > a bug. ). The for loop which adds this entry, only looks at that
> datapath's
> > NAT entries.
> >
> > On the gateway router itself, there would be typically two DNAT entries.
> > One of them connected to internal network (for east-west) and another one
> > at external port (facing physical router).
> >
> >
> Understood.
>
>
> >
> >
> >> In the example used by the commit message, ingress table 1 of
> >> the logical router will have arp response entries for inports alice and
> >> R2_join.
> >>
> > Right. That is because as explained above, I need to do DNAT for both
> > east-west as well as north-south. (It is very possible that I did not
> > understand your concern)
> >
>
> Nah, you set me straight. If there were multiple internal subnets I imagine
> we will need a DNAT
> rule for each, since the response needs to be slightly different for each
> router port. Not an issue, just an observation.
>
>
> >
> >
> >>
> >>
> >> Table 3: do we really intend to apply the actions 'inport = "";
> ct_dnat;'
> >> to all ip packets that do not have an explicit dnat mapping?
> >>
> > Yes. This is a little tricky. I have tried to explain the rationale in a
> > comment above. The general idea is that in a gateway router, there will
> be
> > atleast one DNAT or SNAT entry. Otherwise, why have a gateway router?
> Also,
> > a re-circulation is considered to be very expensive. What we want is to
> > minimize re-circulations. With the code above, we have a minimum of
> > one-recirculation no matter what and a maximum of two re-circulations. I
> > have tried different ways to optimize it. There was a possibility of 3
> > re-circulations as a worst case if I did not force the minimum one
> > re-circulation. Probably there is a different way to optimize it (that I
> > haven't thought about).
> >
> >
> >
> Thanks for the clarification. I don't know enough about the implications of
> calling
> the ct_dnat action, but I imagine that is just noise and -- like you point
> out -- this is only in the
> gateway router and saves on recirculations.
>
>
>
> >
> >
> >>
> >> SNAT: do we need ARP reply rules for the SNAT addresses, similar to the
> >> ones added for DNAT?
> >>
> > I don't think we need ARP reply rules for SNAT entries. What is the use
> > case?
> >
>
> This is likely a moot point in my part. It is just that because in my
> example, the gateway
> router did not have a port in the 30.0.0.x network. So it was not obvious
> to me that if
> it did, it would have the ARP response rule for it's own address, which is
> masking the
> internal ips for foo and bar. Sorry for not understanding that before
> making the noise. :)
>
>
> >
> >>
> >> SNAT: looking at the openflow table I see n mentioning of the address
> >> added
> >> to support SNAT. Ist that because that is all handled by connect_tracker
> >> and there is nothing to be done via openflow? Or maybe part of another
> >> patchset?
> >>
> >
> > We do add SNAT specific rules. Search for S_ROUTER_IN_UNSNAT
> > and S_ROUTER_OUT_SNAT.
> >
> >
>
> Ack, I missed that in the egress datapath. *facepalm*
>
>
>
> >
> >> Thanks,
> >>
> >> -- flaviof
> >>
> >>
> >>
> >>
> >> > Signed-off-by: Gurucharan Shetty <g...@ovn.org>
> >>
> >
>
> Acked-by: Flavio Fernandes <fla...@flaviof.com>
>
> Thank you for taking a look. I applied this. We will fix issues that come
up in real world testing.

>
>
>
>
>
> > > ---
> >> >  ovn/lib/actions.c           |  83 ++++++++++++++++++++
> >> >  ovn/northd/ovn-northd.8.xml | 131 ++++++++++++++++++++++++++++---
> >> >  ovn/northd/ovn-northd.c     | 187
> >> > ++++++++++++++++++++++++++++++++++++++++++--
> >> >  ovn/ovn-nb.ovsschema        |  19 ++++-
> >> >  ovn/ovn-nb.xml              |  65 +++++++++++++--
> >> >  ovn/ovn-sb.xml              |  41 ++++++++++
> >> >  ovn/utilities/ovn-nbctl.c   |   5 ++
> >> >  tests/ovn.at                |  17 ++++
> >> >  8 files changed, 524 insertions(+), 24 deletions(-)
> >> >
> >> > diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
> >> > index 5f0bf19..4a486a0 100644
> >> > --- a/ovn/lib/actions.c
> >> > +++ b/ovn/lib/actions.c
> >> > @@ -442,6 +442,85 @@ emit_ct(struct action_context *ctx, bool
> >> recirc_next,
> >> > bool commit)
> >> >      add_prerequisite(ctx, "ip");
> >> >  }
> >> >
> >> > +static void
> >> > +parse_ct_nat(struct action_context *ctx, bool snat)
> >> > +{
> >> > +    const size_t ct_offset = ctx->ofpacts->size;
> >> > +    ofpbuf_pull(ctx->ofpacts, ct_offset);
> >> > +
> >> > +    struct ofpact_conntrack *ct = ofpact_put_CT(ctx->ofpacts);
> >> > +
> >> > +    if (ctx->ap->cur_ltable < ctx->ap->n_tables) {
> >> > +        ct->recirc_table = ctx->ap->first_ptable +
> ctx->ap->cur_ltable
> >> +
> >> > 1;
> >> > +    } else {
> >> > +        action_error(ctx,
> >> > +                     "\"ct_[sd]nat\" action not allowed in last
> >> table.");
> >> > +        return;
> >> > +    }
> >> > +
> >> > +    if (snat) {
> >> > +        ct->zone_src.field = mf_from_id(MFF_LOG_SNAT_ZONE);
> >> > +    } else {
> >> > +        ct->zone_src.field = mf_from_id(MFF_LOG_DNAT_ZONE);
> >> > +    }
> >> > +    ct->zone_src.ofs = 0;
> >> > +    ct->zone_src.n_bits = 16;
> >> > +    ct->flags = 0;
> >> > +    ct->alg = 0;
> >> > +
> >> > +    add_prerequisite(ctx, "ip");
> >> > +
> >> > +    struct ofpact_nat *nat;
> >> > +    size_t nat_offset;
> >> > +    nat_offset = ctx->ofpacts->size;
> >> > +    ofpbuf_pull(ctx->ofpacts, nat_offset);
> >> > +
> >> > +    nat = ofpact_put_NAT(ctx->ofpacts);
> >> > +    nat->flags = 0;
> >> > +    nat->range_af = AF_UNSPEC;
> >> > +
> >> > +    int commit = 0;
> >> > +    if (lexer_match(ctx->lexer, LEX_T_LPAREN)) {
> >> > +        ovs_be32 ip;
> >> > +        if (ctx->lexer->token.type == LEX_T_INTEGER
> >> > +            && ctx->lexer->token.format == LEX_F_IPV4) {
> >> > +            ip = ctx->lexer->token.value.ipv4;
> >> > +        } else {
> >> > +            action_syntax_error(ctx, "invalid ip");
> >> > +            return;
> >> > +        }
> >> > +
> >> > +        nat->range_af = AF_INET;
> >> > +        nat->range.addr.ipv4.min = ip;
> >> > +        if (snat) {
> >> > +            nat->flags |= NX_NAT_F_SRC;
> >> > +        } else {
> >> > +            nat->flags |= NX_NAT_F_DST;
> >> > +        }
> >> > +        commit = NX_CT_F_COMMIT;
> >> > +        lexer_get(ctx->lexer);
> >> > +        if (!lexer_match(ctx->lexer, LEX_T_RPAREN)) {
> >> > +            action_syntax_error(ctx, "expecting `)'");
> >> > +            return;
> >> > +        }
> >> > +    }
> >> > +
> >> > +    ctx->ofpacts->header = ofpbuf_push_uninit(ctx->ofpacts,
> >> nat_offset);
> >> > +    ct = ctx->ofpacts->header;
> >> > +    ct->flags |= commit;
> >> > +
> >> > +    /* XXX: For performance reasons, we try to prevent additional
> >> > +     * recirculations.  So far, ct_snat which is used in a gateway
> >> router
> >> > +     * does not need a recirculation. ct_snat(IP) does need a
> >> > recirculation.
> >> > +     * Should we consider a method to let the actions specify
> whether a
> >> > action
> >> > +     * needs recirculation if there more use cases?. */
> >> > +    if (!commit && snat) {
> >> > +        ct->recirc_table = NX_CT_RECIRC_NONE;
> >> > +    }
> >> > +    ofpact_finish(ctx->ofpacts, &ct->ofpact);
> >> > +    ofpbuf_push_uninit(ctx->ofpacts, ct_offset);
> >> > +}
> >> > +
> >> >  static bool
> >> >  parse_action(struct action_context *ctx)
> >> >  {
> >> > @@ -469,6 +548,10 @@ parse_action(struct action_context *ctx)
> >> >          emit_ct(ctx, true, false);
> >> >      } else if (lexer_match_id(ctx->lexer, "ct_commit")) {
> >> >          emit_ct(ctx, false, true);
> >> > +    } else if (lexer_match_id(ctx->lexer, "ct_dnat")) {
> >> > +        parse_ct_nat(ctx, false);
> >> > +    } else if (lexer_match_id(ctx->lexer, "ct_snat")) {
> >> > +        parse_ct_nat(ctx, true);
> >> >      } else if (lexer_match_id(ctx->lexer, "arp")) {
> >> >          parse_arp_action(ctx);
> >> >      } else if (lexer_match_id(ctx->lexer, "get_arp")) {
> >> > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> >> > index 1983812..c237604 100644
> >> > --- a/ovn/northd/ovn-northd.8.xml
> >> > +++ b/ovn/northd/ovn-northd.8.xml
> >> > @@ -517,11 +517,40 @@ next;
> >> >
> >> >        <li>
> >> >          <p>
> >> > -          Reply to ARP requests.  These flows reply to ARP requests
> for
> >> > the
> >> > -          router's own IP address.  For each router port <var>P</var>
> >> > that owns
> >> > -          IP address <var>A</var> and Ethernet address <var>E</var>,
> a
> >> > -          priority-90 flow matches <code>inport == <var>P</var>
> >> &amp;&amp;
> >> > -          arp.op == 1 &amp;&amp; arp.tpa == <var>A</var></code> (ARP
> >> > request)
> >> > +          Reply to ARP requests.
> >> > +        </p>
> >> > +
> >> > +        <p>
> >> > +          These flows reply to ARP requests for the router's own IP
> >> > address.
> >> > +          For each router port <var>P</var> that owns IP address
> >> > <var>A</var>
> >> > +          and Ethernet address <var>E</var>, a priority-90 flow
> matches
> >> > +          <code>inport == <var>P</var> &amp;&amp; arp.op == 1
> >> &amp;&amp;
> >> > +          arp.tpa == <var>A</var></code> (ARP request) with the
> >> following
> >> > +          actions:
> >> > +        </p>
> >> > +
> >> > +        <pre>
> >> > +eth.dst = eth.src;
> >> > +eth.src = <var>E</var>;
> >> > +arp.op = 2; /* ARP reply. */
> >> > +arp.tha = arp.sha;
> >> > +arp.sha = <var>E</var>;
> >> > +arp.tpa = arp.spa;
> >> > +arp.spa = <var>A</var>;
> >> > +outport = <var>P</var>;
> >> > +inport = ""; /* Allow sending out inport. */
> >> > +output;
> >> > +        </pre>
> >> > +      </li>
> >> > +
> >> > +      <li>
> >> > +        <p>
> >> > +          These flows reply to ARP requests for the virtual IP
> >> addresses
> >> > +          configured in the router for DNAT. For a configured DNAT IP
> >> > address
> >> > +          <var>A</var>, for each router port <var>P</var> with
> Ethernet
> >> > +          address <var>E</var>, a priority-90 flow matches
> >> > +          <code>inport == <var>P</var> &amp;&amp; arp.op == 1
> >> &amp;&amp;
> >> > +          arp.tpa == <var>A</var></code> (ARP request)
> >> >            with the following actions:
> >> >          </p>
> >> >
> >> > @@ -663,7 +692,62 @@ icmp4 {
> >> >        </li>
> >> >      </ul>
> >> >
> >> > -    <h3>Ingress Table 2: IP Routing</h3>
> >> > +    <h3>Ingress Table 2: UNSNAT</h3>
> >> > +
> >> > +    <p>
> >> > +      This is for already established connections' reverse traffic.
> >> > +      i.e., SNAT has already been done in egress pipeline and now the
> >> > +      packet has entered the ingress pipeline as part of a reply.  It
> >> is
> >> > +      unSNATted here.
> >> > +    </p>
> >> > +
> >> > +    <ul>
> >> > +      <li>
> >> > +        <p>
> >> > +          For each configuration in the OVN Northbound database, that
> >> asks
> >> > +          to change the source IP address of a packet from
> >> <var>A</var> to
> >> > +          <var>B</var>, a priority-100 flow matches <code>ip
> &amp;&amp;
> >> > +          ip4.dst == <var>B</var></code> with an action
> >> > +          <code>ct_snat; next;</code>.
> >> > +        </p>
> >> > +
> >> > +        <p>
> >> > +          A priority-0 logical flow with match <code>1</code> has
> >> actions
> >> > +          <code>next;</code>.
> >> > +        </p>
> >> > +      </li>
> >> > +    </ul>
> >> > +
> >> > +    <h3>Ingress Table 3: DNAT</h3>
> >> > +
> >> > +    <p>
> >> > +      Packets enter the pipeline with destination IP address that
> >> needs to
> >> > +      be DNATted from a virtual IP address to a real IP address.
> >> Packets
> >> > +      in the reverse direction needs to be unDNATed.
> >> > +    </p>
> >> > +    <ul>
> >> > +      <li>
> >> > +        <p>
> >> > +          For each configuration in the OVN Northbound database, that
> >> asks
> >> > +          to change the destination IP address of a packet from
> >> > <var>A</var> to
> >> > +          <var>B</var>, a priority-100 flow matches <code>ip
> &amp;&amp;
> >> > +          ip4.dst == <var>A</var></code> with an action <code>inport
> =
> >> "";
> >> > +          ct_dnat(<var>B</var>);</code>.
> >> > +        </p>
> >> > +
> >> > +        <p>
> >> > +          For all IP packets of a Gateway router, a priority-50 flow
> >> with
> >> > an
> >> > +          action <code>inport = ""; ct_dnat;</code>.
> >> > +        </p>
> >> > +
> >> > +        <p>
> >> > +          A priority-0 logical flow with match <code>1</code> has
> >> actions
> >> > +          <code>next;</code>.
> >> > +        </p>
> >> > +      </li>
> >> > +    </ul>
> >> > +
> >> > +    <h3>Ingress Table 4: IP Routing</h3>
> >> >
> >> >      <p>
> >> >        A packet that arrives at this table is an IP packet that should
> >> be
> >> > routed
> >> > @@ -672,7 +756,7 @@ icmp4 {
> >> >        <code>ip4.dst</code>, the packet's final destination,
> unchanged)
> >> and
> >> >        advances to the next table for ARP resolution.  It also sets
> >> >        <code>reg1</code> to the IP address owned by the selected
> router
> >> > port
> >> > -      (which is used later in table 4 as the IP source address for an
> >> ARP
> >> > +      (which is used later in table 6 as the IP source address for an
> >> ARP
> >> >        request, if needed).
> >> >      </p>
> >> >
> >> > @@ -743,7 +827,7 @@ icmp4 {
> >> >        </li>
> >> >      </ul>
> >> >
> >> > -    <h3>Ingress Table 3: ARP Resolution</h3>
> >> > +    <h3>Ingress Table 5: ARP Resolution</h3>
> >> >
> >> >      <p>
> >> >        Any packet that reaches this table is an IP packet whose
> >> next-hop IP
> >> > @@ -798,7 +882,7 @@ icmp4 {
> >> >        </li>
> >> >      </ul>
> >> >
> >> > -    <h3>Ingress Table 4: ARP Request</h3>
> >> > +    <h3>Ingress Table 6: ARP Request</h3>
> >> >
> >> >      <p>
> >> >        In the common case where the Ethernet destination has been
> >> > resolved, this
> >> > @@ -823,7 +907,7 @@ arp {
> >> >          </pre>
> >> >
> >> >          <p>
> >> > -          (Ingress table 2 initialized <code>reg1</code> with the IP
> >> > address
> >> > +          (Ingress table 4 initialized <code>reg1</code> with the IP
> >> > address
> >> >            owned by <code>outport</code>.)
> >> >          </p>
> >> >
> >> > @@ -838,7 +922,32 @@ arp {
> >> >        </li>
> >> >      </ul>
> >> >
> >> > -    <h3>Egress Table 0: Delivery</h3>
> >> > +    <h3>Egress Table 0: SNAT</h3>
> >> > +
> >> > +    <p>
> >> > +      Packets that are configured to be SNATed get their source IP
> >> address
> >> > +      changed based on the configuration in the OVN Northbound
> >> database.
> >> > +    </p>
> >> > +    <ul>
> >> > +      <li>
> >> > +        <p>
> >> > +          For each configuration in the OVN Northbound database, that
> >> asks
> >> > +          to change the source IP address of a packet from an IP
> >> address
> >> > of
> >> > +          <var>A</var> or to change the source IP address of a packet
> >> that
> >> > +          belongs to network <var>A</var> to <var>B</var>, a flow
> >> matches
> >> > +          <code>ip &amp;&amp; ip4.src == <var>A</var></code> with an
> >> > action
> >> > +          <code>ct_snat(<var>B</var>);</code>.  The priority of the
> >> flow
> >> > +          is calculated based on the mask of <var>A</var>, with
> matches
> >> > +          having larger masks getting higher priorities.
> >> > +        </p>
> >> > +        <p>
> >> > +          A priority-0 logical flow with match <code>1</code> has
> >> actions
> >> > +          <code>next;</code>.
> >> > +        </p>
> >> > +      </li>
> >> > +    </ul>
> >> > +
> >> > +    <h3>Egress Table 1: Delivery</h3>
> >> >
> >> >      <p>
> >> >        Packets that reach this table are ready for delivery.  It
> >> contains
> >> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> >> > index cac0148..4683780 100644
> >> > --- a/ovn/northd/ovn-northd.c
> >> > +++ b/ovn/northd/ovn-northd.c
> >> > @@ -105,12 +105,15 @@ enum ovn_stage {
> >> >      /* Logical router ingress stages. */
>   \
> >> >      PIPELINE_STAGE(ROUTER, IN,  ADMISSION,   0, "lr_in_admission")
>   \
> >> >      PIPELINE_STAGE(ROUTER, IN,  IP_INPUT,    1, "lr_in_ip_input")
>  \
> >> > -    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  2, "lr_in_ip_routing")
>  \
> >> > -    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 3,
> "lr_in_arp_resolve")  \
> >> > -    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 4,
> "lr_in_arp_request")  \
> >> > +    PIPELINE_STAGE(ROUTER, IN,  UNSNAT,      2, "lr_in_unsnat")
>  \
> >> > +    PIPELINE_STAGE(ROUTER, IN,  DNAT,        3, "lr_in_dnat")
>  \
> >> > +    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  4, "lr_in_ip_routing")
>  \
> >> > +    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 5,
> "lr_in_arp_resolve")  \
> >> > +    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 6,
> "lr_in_arp_request")  \
> >> >
>   \
> >> >      /* Logical router egress stages. */
>  \
> >> > -    PIPELINE_STAGE(ROUTER, OUT, DELIVERY,    0, "lr_out_delivery")
> >> > +    PIPELINE_STAGE(ROUTER, OUT, SNAT,      0, "lr_out_snat")
>   \
> >> > +    PIPELINE_STAGE(ROUTER, OUT, DELIVERY,  1, "lr_out_delivery")
> >> >
> >> >  #define PIPELINE_STAGE(DP_TYPE, PIPELINE, STAGE, TABLE, NAME)   \
> >> >      S_##DP_TYPE##_##PIPELINE##_##STAGE                          \
> >> > @@ -1998,6 +2001,51 @@ build_lrouter_flows(struct hmap *datapaths,
> >> struct
> >> > hmap *ports,
> >> >          free(match);
> >> >          free(actions);
> >> >
> >> > +        /* ARP handling for external IP addresses.
> >> > +         *
> >> > +         * DNAT IP addresses are external IP addresses that need ARP
> >> > +         * handling. */
> >> > +        for (int i = 0; i < op->od->nbr->n_nat; i++) {
> >> > +            const struct nbrec_nat *nat;
> >> > +
> >> > +            nat = op->od->nbr->nat[i];
> >> > +
> >> > +            if(!strcmp(nat->type, "snat")) {
> >> > +                continue;
> >> > +            }
> >> > +
> >> > +            ovs_be32 ip;
> >> > +            if (!ip_parse(nat->external_ip, &ip) || !ip) {
> >> > +                static struct vlog_rate_limit rl =
> >> > VLOG_RATE_LIMIT_INIT(5, 1);
> >> > +                VLOG_WARN_RL(&rl, "bad ip address %s in dnat
> >> > configuration "
> >> > +                             "for router %s", nat->external_ip,
> >> op->key);
> >> > +                continue;
> >> > +            }
> >> > +
> >> > +            match = xasprintf(
> >> > +                "inport == %s && arp.tpa == "IP_FMT" && arp.op == 1",
> >> > +                op->json_key, IP_ARGS(ip));
> >> > +            actions = xasprintf(
> >> > +                "eth.dst = eth.src; "
> >> > +                "eth.src = "ETH_ADDR_FMT"; "
> >> > +                "arp.op = 2; /* ARP reply */ "
> >> > +                "arp.tha = arp.sha; "
> >> > +                "arp.sha = "ETH_ADDR_FMT"; "
> >> > +                "arp.tpa = arp.spa; "
> >> > +                "arp.spa = "IP_FMT"; "
> >> > +                "outport = %s; "
> >> > +                "inport = \"\"; /* Allow sending out inport. */ "
> >> > +                "output;",
> >> > +                ETH_ADDR_ARGS(op->mac),
> >> > +                ETH_ADDR_ARGS(op->mac),
> >> > +                IP_ARGS(ip),
> >> > +                op->json_key);
> >> > +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90,
> >> > +                          match, actions);
> >> > +            free(match);
> >> > +            free(actions);
> >> > +        }
> >> > +
> >> >          /* Drop IP traffic to this router. */
> >> >          match = xasprintf("ip4.dst == "IP_FMT, IP_ARGS(op->ip));
> >> >          ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 60,
> >> > @@ -2005,6 +2053,135 @@ build_lrouter_flows(struct hmap *datapaths,
> >> struct
> >> > hmap *ports,
> >> >          free(match);
> >> >      }
> >> >
> >> > +    /* NAT in Gateway routers. */
> >> > +    HMAP_FOR_EACH (od, key_node, datapaths) {
> >> > +        if (!od->nbr) {
> >> > +            continue;
> >> > +        }
> >> > +
> >> > +        /* Packets are allowed by default. */
> >> > +        ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1",
> "next;");
> >> > +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1",
> "next;");
> >> > +        ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;");
> >> > +
> >> > +        /* NAT rules are only valid on Gateway routers. */
> >> > +        if (!smap_get(&od->nbr->options, "chassis")) {
> >> > +            continue;
> >> > +        }
> >> > +
> >> > +        for (int i = 0; i < od->nbr->n_nat; i++) {
> >> > +            const struct nbrec_nat *nat;
> >> > +
> >> > +            nat = od->nbr->nat[i];
> >> > +
> >> > +            ovs_be32 ip, mask;
> >> > +
> >> > +            char *error = ip_parse_masked(nat->external_ip, &ip,
> >> &mask);
> >> > +            if (error || mask != OVS_BE32_MAX) {
> >> > +                static struct vlog_rate_limit rl =
> >> > VLOG_RATE_LIMIT_INIT(5, 1);
> >> > +                VLOG_WARN_RL(&rl, "bad external ip %s for nat",
> >> > +                             nat->external_ip);
> >> > +                free(error);
> >> > +                continue;
> >> > +            }
> >> > +
> >> > +            /* Check the validity of nat->logical_ip. 'logical_ip'
> can
> >> > +             * be a subnet when the type is "snat". */
> >> > +            error = ip_parse_masked(nat->logical_ip, &ip, &mask);
> >> > +            if (!strcmp(nat->type, "snat")) {
> >> > +                if (error) {
> >> > +                    static struct vlog_rate_limit rl =
> >> > +                        VLOG_RATE_LIMIT_INIT(5, 1);
> >> > +                    VLOG_WARN_RL(&rl, "bad ip network or ip %s for
> >> snat "
> >> > +                                 "in router "UUID_FMT"",
> >> > +                                 nat->logical_ip,
> UUID_ARGS(&od->key));
> >> > +                    free(error);
> >> > +                    continue;
> >> > +                }
> >> > +            } else {
> >> > +                if (error || mask != OVS_BE32_MAX) {
> >> > +                    static struct vlog_rate_limit rl =
> >> > +                        VLOG_RATE_LIMIT_INIT(5, 1);
> >> > +                    VLOG_WARN_RL(&rl, "bad ip %s for dnat in router "
> >> > +                        ""UUID_FMT"", nat->logical_ip,
> >> > UUID_ARGS(&od->key));
> >> > +                    free(error);
> >> > +                    continue;
> >> > +                }
> >> > +            }
> >> > +
> >> > +
> >> > +            char *match, *actions;
> >> > +
> >> > +            /* Ingress UNSNAT table: It is for already established
> >> > connections'
> >> > +             * reverse traffic. i.e., SNAT has already been done in
> >> egress
> >> > +             * pipeline and now the packet has entered the ingress
> >> > pipeline as
> >> > +             * part of a reply. We undo the SNAT here.
> >> > +             *
> >> > +             * Undoing SNAT has to happen before DNAT processing.
> >> This is
> >> > +             * because when the packet was DNATed in ingress
> pipeline,
> >> it
> >> > did
> >> > +             * not know about the possibility of eventual additional
> >> SNAT
> >> > in
> >> > +             * egress pipeline. */
> >> > +            if (!strcmp(nat->type, "snat")
> >> > +                || !strcmp(nat->type, "dnat_and_snat")) {
> >> > +                match = xasprintf("ip && ip4.dst == %s",
> >> > nat->external_ip);
> >> > +                ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 100,
> >> > +                              match, "ct_snat; next;");
> >> > +                free(match);
> >> > +            }
> >> > +
> >> > +            /* Ingress DNAT table: Packets enter the pipeline with
> >> > destination
> >> > +             * IP address that needs to be DNATted from a external IP
> >> > address
> >> > +             * to a logical IP address. */
> >> > +            if (!strcmp(nat->type, "dnat")
> >> > +                || !strcmp(nat->type, "dnat_and_snat")) {
> >> > +                /* Packet when it goes from the initiator to
> >> destination.
> >> > +                 * We need to zero the inport because the router can
> >> > +                 * send the packet back through the same interface.
> */
> >> > +                match = xasprintf("ip && ip4.dst == %s",
> >> > nat->external_ip);
> >> > +                actions = xasprintf("inport = \"\"; ct_dnat(%s);",
> >> > +                                    nat->logical_ip);
> >> > +                ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100,
> >> > +                           match, actions);
> >> > +                free(match);
> >> > +                free(actions);
> >> > +            }
> >> > +
> >> > +            /* Egress SNAT table: Packets enter the egress pipeline
> >> with
> >> > +             * source ip address that needs to be SNATted to a
> >> external ip
> >> > +             * address. */
> >> > +            if (!strcmp(nat->type, "snat")
> >> > +                || !strcmp(nat->type, "dnat_and_snat")) {
> >> > +                match = xasprintf("ip && ip4.src == %s",
> >> nat->logical_ip);
> >> > +                actions = xasprintf("ct_snat(%s);",
> nat->external_ip);
> >> > +
> >> > +                /* The priority here is calculated such that the
> >> > +                 * nat->logical_ip with the longest mask gets a
> higher
> >> > +                 * priority. */
> >> > +                ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT,
> >> > +                              count_1bits(ntohl(mask)) + 1, match,
> >> > actions);
> >> > +                free(match);
> >> > +                free(actions);
> >> > +            }
> >> > +        }
> >> > +
> >> > +        /* Re-circulate every packet through the DNAT zone.
> >> > +        * This helps with two things.
> >> > +        *
> >> > +        * 1. Any packet that needs to be unDNATed in the reverse
> >> > +        * direction gets unDNATed. Ideally this could be done in
> >> > +        * the egress pipeline. But since the gateway router
> >> > +        * does not have any feature that depends on the source
> >> > +        * ip address being external IP address for IP routing,
> >> > +        * we can do it here, saving a future re-circulation.
> >> > +        *
> >> > +        * 2. Any packet that was sent through SNAT zone in the
> >> > +        * previous table automatically gets re-circulated to get
> >> > +        * back the new destination IP address that is needed for
> >> > +        * routing in the openflow pipeline. */
> >> > +        ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
> >> > +                      "ip", "inport = \"\"; ct_dnat;");
> >> > +    }
> >> > +
> >> >      /* Logical router ingress table 2: IP Routing.
> >> >       *
> >> >       * A packet that arrives at this table is an IP packet that
> should
> >> be
> >> > @@ -2205,7 +2382,7 @@ build_lrouter_flows(struct hmap *datapaths,
> struct
> >> > hmap *ports,
> >> >          ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_REQUEST, 0, "1",
> >> > "output;");
> >> >      }
> >> >
> >> > -    /* Logical router egress table 0: Delivery (priority 100).
> >> > +    /* Logical router egress table 1: Delivery (priority 100).
> >> >       *
> >> >       * Priority 100 rules deliver packets to enabled logical ports.
> */
> >> >      HMAP_FOR_EACH (op, key_node, ports) {
> >> > diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema
> >> > index fa21b30..ac6ca14 100644
> >> > --- a/ovn/ovn-nb.ovsschema
> >> > +++ b/ovn/ovn-nb.ovsschema
> >> > @@ -1,7 +1,7 @@
> >> >  {
> >> >      "name": "OVN_Northbound",
> >> > -    "version": "2.1.2",
> >> > -    "cksum": "429668869 5325",
> >> > +    "version": "2.1.3",
> >> > +    "cksum": "3631923697 6121",
> >> >      "tables": {
> >> >          "Logical_Switch": {
> >> >              "columns": {
> >> > @@ -78,6 +78,11 @@
> >> >                                     "max": "unlimited"}},
> >> >                  "default_gw": {"type": {"key": "string", "min": 0,
> >> "max":
> >> > 1}},
> >> >                  "enabled": {"type": {"key": "boolean", "min": 0,
> "max":
> >> > 1}},
> >> > +                "nat": {"type": {"key": {"type": "uuid",
> >> > +                                         "refTable": "NAT",
> >> > +                                         "refType": "strong"},
> >> > +                                 "min": 0,
> >> > +                                 "max": "unlimited"}},
> >> >                  "options": {
> >> >                       "type": {"key": "string",
> >> >                                "value": "string",
> >> > @@ -104,6 +109,16 @@
> >> >                  "ip_prefix": {"type": "string"},
> >> >                  "nexthop": {"type": "string"},
> >> >                  "output_port": {"type": {"key": "string", "min": 0,
> >> > "max": 1}}},
> >> > +            "isRoot": false},
> >> > +        "NAT": {
> >> > +            "columns": {
> >> > +                "external_ip": {"type": "string"},
> >> > +                "logical_ip": {"type": "string"},
> >> > +                "type": {"type": {"key": {"type": "string",
> >> > +                                           "enum": ["set", ["dnat",
> >> > +                                                             "snat",
> >> > +
> >> >  "dnat_and_snat"
> >> > +
>  ]]}}}},
> >> >              "isRoot": false}
> >> >      }
> >> >  }
> >> > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> >> > index 130b63b..36d1158 100644
> >> > --- a/ovn/ovn-nb.xml
> >> > +++ b/ovn/ovn-nb.xml
> >> > @@ -631,18 +631,31 @@
> >> >        router has all ingress and egress traffic dropped.
> >> >      </column>
> >> >
> >> > +    <column name="nat">
> >> > +      One or more NAT rules for the router. NAT rules only work on
> the
> >> > +      Gateway routers.
> >> > +    </column>
> >> > +
> >> >      <group title="Options">
> >> >        <p>
> >> >          Additional options for the logical router.
> >> >        </p>
> >> >
> >> >        <column name="options" key="chassis">
> >> > -        If set, indicates that the logical router in question is
> >> > -        a Gateway router (which is centralized) and resides in the
> set
> >> > -        chassis.  The same value is also used by
> >> > <code>ovn-controller</code>
> >> > -        to uniquely identify the chassis in the OVN deployment and
> >> > -        comes from <code>external_ids:system-id</code> in the
> >> > -        <code>Open_vSwitch</code> table of Open_vSwitch database.
> >> > +        <p>
> >> > +          If set, indicates that the logical router in question is a
> >> > Gateway
> >> > +          router (which is centralized) and resides in the set
> chassis.
> >> > The
> >> > +          same value is also used by <code>ovn-controller</code> to
> >> > +          uniquely identify the chassis in the OVN deployment and
> >> > +          comes from <code>external_ids:system-id</code> in the
> >> > +          <code>Open_vSwitch</code> table of Open_vSwitch database.
> >> > +        </p>
> >> > +
> >> > +        <p>
> >> > +          The Gateway router can only be connected to a distributed
> >> router
> >> > +          via a switch if SNAT and DNAT are to be configured in the
> >> > Gateway
> >> > +          router.
> >> > +        </p>
> >> >        </column>
> >> >      </group>
> >> >
> >> > @@ -765,4 +778,44 @@
> >> >      </column>
> >> >    </table>
> >> >
> >> > +  <table name="NAT" title="NAT rules for a Gateway router.">
> >> > +    <p>
> >> > +      Each record represents a NAT rule in a Gateway router.
> >> > +    </p>
> >> > +
> >> > +    <column name="type">
> >> > +      <p>Type of the NAT rule.</p>
> >> > +      <ul>
> >> > +        <li>
> >> > +          When <ref column="type"/> is <code>dnat</code>, the
> >> externally
> >> > +          visible IP address <ref column="external_ip"/> is DNATted
> to
> >> > the IP
> >> > +          address <ref column="logical_ip"/> in the logical space.
> >> > +        </li>
> >> > +        <li>
> >> > +          When <ref column="type"/> is <code>snat</code>, IP packets
> >> > +          with their source IP address that either matches the IP
> >> address
> >> > +          in <ref column="logical_ip"/> or is in the network provided
> >> by
> >> > +          <ref column="logical_ip"/> is SNATed into the IP address in
> >> > +          <ref column="external_ip"/>.
> >> > +        </li>
> >> > +        <li>
> >> > +          When <ref column="type"/> is <code>dnat_and_snat</code>,
> the
> >> > +          externally visible IP address <ref column="external_ip"/>
> is
> >> > +          DNATted to the IP address <ref column="logical_ip"/> in the
> >> > +          logical space. In addition, IP packets with the source IP
> >> > +          address that matches <ref column="logical_ip"/> is SNATed
> >> into
> >> > +          the IP address in <ref column="external_ip"/>.
> >> > +        </li>
> >> > +      </ul>
> >> > +    </column>
> >> > +
> >> > +    <column name="external_ip">
> >> > +      An IPv4 address.
> >> > +    </column>
> >> > +
> >> > +    <column name="logical_ip">
> >> > +      An IPv4 network (e.g 192.168.1.0/24) or an IPv4 address.
> >> > +    </column>
> >> > +  </table>
> >> > +
> >> >  </database>
> >> > diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
> >> > index 1231b4e..5665871 100644
> >> > --- a/ovn/ovn-sb.xml
> >> > +++ b/ovn/ovn-sb.xml
> >> > @@ -951,6 +951,47 @@
> >> >            </p>
> >> >          </dd>
> >> >
> >> > +        <dt><code>ct_dnat;</code></dt>
> >> > +        <dt><code>ct_dnat(<var>IP</var>);</code></dt>
> >> > +        <dd>
> >> > +          <p>
> >> > +            <code>ct_dnat</code> sends the packet through the DNAT
> >> zone in
> >> > +            connection tracking table to unDNAT any packet that was
> >> > DNATed in
> >> > +            the opposite direction.  The packet is then automatically
> >> > sent to
> >> > +            to the next tables as if followed by <code>next;</code>
> >> > action.
> >> > +            The next tables will see the changes in the packet caused
> >> by
> >> > +            the connection tracker.
> >> > +          </p>
> >> > +          <p>
> >> > +            <code>ct_dnat(<var>IP</var>)</code> sends the packet
> >> through
> >> > the
> >> > +            DNAT zone to change the destination IP address of the
> >> packet
> >> > to
> >> > +            the one provided inside the parenthesis and commits the
> >> > connection.
> >> > +            The packet is then automatically sent to the next tables
> >> as if
> >> > +            followed by <code>next;</code> action.  The next tables
> >> will
> >> > see
> >> > +            the changes in the packet caused by the connection
> tracker.
> >> > +          </p>
> >> > +        </dd>
> >> > +
> >> > +        <dt><code>ct_snat;</code></dt>
> >> > +        <dt><code>ct_snat(<var>IP</var>);</code></dt>
> >> > +        <dd>
> >> > +          <p>
> >> > +            <code>ct_snat</code> sends the packet through the SNAT
> >> zone to
> >> > +            unSNAT any packet that was SNATed in the opposite
> >> direction.
> >> > If
> >> > +            the packet needs to be sent to the next tables, then it
> >> > should be
> >> > +            followed by a <code>next;</code> action.  The next tables
> >> > will not
> >> > +            see the changes in the packet caused by the connection
> >> > tracker.
> >> > +          </p>
> >> > +          <p>
> >> > +            <code>ct_snat(<var>IP</var>)</code> sends the packet
> >> through
> >> > the
> >> > +            SNAT zone to change the source IP address of the packet
> to
> >> > +            the one provided inside the parenthesis and commits the
> >> > connection.
> >> > +            The packet is then automatically sent to the next tables
> >> as if
> >> > +            followed by <code>next;</code> action.  The next tables
> >> will
> >> > see the
> >> > +            changes in the packet caused by the connection tracker.
> >> > +          </p>
> >> > +        </dd>
> >> > +
> >> >          <dt><code>arp { <var>action</var>; </code>...<code>
> >> };</code></dt>
> >> >          <dd>
> >> >            <p>
> >> > diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
> >> > index 321040e..b821307 100644
> >> > --- a/ovn/utilities/ovn-nbctl.c
> >> > +++ b/ovn/utilities/ovn-nbctl.c
> >> > @@ -1449,6 +1449,11 @@ static const struct ctl_table_class tables[] =
> {
> >> >         NULL},
> >> >        {NULL, NULL, NULL}}},
> >> >
> >> > +    {&nbrec_table_nat,
> >> > +     {{&nbrec_table_nat, NULL,
> >> > +       NULL},
> >> > +      {NULL, NULL, NULL}}},
> >> > +
> >> >      {NULL, {{NULL, NULL, NULL}, {NULL, NULL, NULL}}}
> >> >  };
> >> >
> >> > diff --git a/tests/ovn.at b/tests/ovn.at
> >> > index 633cf35..19d5c73 100644
> >> > --- a/tests/ovn.at
> >> > +++ b/tests/ovn.at
> >> > @@ -507,6 +507,23 @@ ip.ttl => Syntax error at end of input expecting
> >> `--'.
> >> >  ct_next; => actions=ct(table=27,zone=NXM_NX_REG5[0..15]), prereqs=ip
> >> >  ct_commit; => actions=ct(commit,zone=NXM_NX_REG5[0..15]), prereqs=ip
> >> >
> >> > +# dnat
> >> > +ct_dnat; => actions=ct(table=27,zone=NXM_NX_REG3[0..15],nat),
> >> prereqs=ip
> >> > +ct_dnat(192.168.1.2); =>
> >> >
> >>
> actions=ct(commit,table=27,zone=NXM_NX_REG3[0..15],nat(dst=192.168.1.2)),
> >> > prereqs=ip
> >> > +ct_dnat(192.168.1.2, 192.168.1.3); => Syntax error at `,' expecting
> >> `)'.
> >> > +ct_dnat(foo); => Syntax error at `foo' invalid ip.
> >> > +ct_dnat(foo, bar); => Syntax error at `foo' invalid ip.
> >> > +ct_dnat(); => Syntax error at `)' invalid ip.
> >> > +
> >> > +# snat
> >> > +ct_snat; => actions=ct(zone=NXM_NX_REG4[0..15],nat), prereqs=ip
> >> > +ct_snat(192.168.1.2); =>
> >> >
> >>
> actions=ct(commit,table=27,zone=NXM_NX_REG4[0..15],nat(src=192.168.1.2)),
> >> > prereqs=ip
> >> > +ct_snat(192.168.1.2, 192.168.1.3); => Syntax error at `,' expecting
> >> `)'.
> >> > +ct_snat(foo); => Syntax error at `foo' invalid ip.
> >> > +ct_snat(foo, bar); => Syntax error at `foo' invalid ip.
> >> > +ct_snat(); => Syntax error at `)' invalid ip.
> >> > +
> >> > +
> >> >  # arp
> >> >  arp { eth.dst = ff:ff:ff:ff:ff:ff; output; }; =>
> >> >
> >>
> actions=controller(userdata=00.00.00.00.00.00.00.00.00.19.00.10.80.00.06.06.ff.ff.ff.ff.ff.ff.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.40.00.00.00),
> >> > prereqs=ip4
> >> >
> >> > --
> >> > 1.9.1
> >> >
> >> >
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> http://openvswitch.org/mailman/listinfo/dev
> >>
> >
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to