[ovs-dev] [PATCH v2] ovn: Update TODO, ovn-northd flow table design, ovn-architecture for L3.

Ben Pfaff Tue, 06 Oct 2015 14:46:33 -0700

This is a proposed plan for logical L3 in OVN.  It is not entirely
complete but it includes many important details and I believe that it moves
planning forward.


Signed-off-by: Ben Pfaff <[email protected]>
---
v1->v2: Resolve comments from Justin and Russell posted to ovs-dev.

 ovn/TODO                    | 281 +++++++++++++++++++++++++++++++++
 ovn/northd/ovn-northd.8.xml | 369 +++++++++++++++++++++++++++++++++++++++++++-
 ovn/ovn-architecture.7.xml  |   2 +-
 ovn/ovn-sb.xml              | 111 +++++++++++--
 4 files changed, 744 insertions(+), 19 deletions(-)

diff --git a/ovn/TODO b/ovn/TODO
index a48251f..51c3043 100644
--- a/ovn/TODO
+++ b/ovn/TODO
@@ -1,3 +1,284 @@
+-*- outline -*-
+
+* L3 support
+
+** OVN_Northbound schema
+
+*** Needs to support interconnected routers
+
+It should be possible to connect one router to another, e.g. to
+represent a provider/tenant router relationship.  This requires
+an OVN_Northbound schema change.
+
+*** Needs to support extra routes
+
+Currently a router port has a single route associated with it, but
+presumably we should support multiple routes.  For connections from
+one router to another, this doesn't seem to matter (just put more than
+one connection between them), but for connections between a router and
+a switch it might matter because a switch has only one router port.
+
+** OVN_SB schema
+
+*** Logical datapath interconnection
+
+There needs to be a way in the OVN_Southbound database to express
+connections between logical datapaths, so that packets can pass from a
+logical switch to its logical router (and vice versa) and from one
+logical router to another.
+
+One way to do this would be to introduce logical patch ports, closely
+modeled on the "physical" patch ports that OVS has had for ages.  Each
+logical patch port would consist of two rows in the Port_Binding table
+(one in each logical datapath), with type "patch" and an option "peer"
+that names the other logical port in the pair.
+
+If we do it this way then we'll need to figure out one odd special
+case.  Currently the ACL table documents that the logical router port
+is always named "ROUTER".  This can't be done directly with this patch
+port technique, because every row in the Logical_Port table must have
+a unique name.  This probably means that we should change the
+convention for the ACL table so that the logical router port name is
+unique; for example, we could change the Logical_Router_Port table to
+require the 'name' column to be unique, and then use that name in the
+ACL table.
+
+*** Allow output to ingress port
+
+Sometimes when a packet ingresses into a router, it has to egress the
+same port.  One example is a "one-armed" router that has multiple
+routes on a single port (or in which a host is (mis)configured to send
+every IP packet to the router, e.g. due to a bad netmask).  Another is
+when a router needs to send an ICMP reply to a ingressing packet.
+
+To some degree this problem is layered, because there are two
+different notions of "ingress port".  The first is the OpenFlow
+ingress port, essentially a physical port identifier.  This is
+implemented as part of ovs-vswitchd's OpenFlow implementation.  It
+prevents a reply from being sent across the tunnel on which it
+arrived.  It is questionable whether this OpenFlow feature is useful
+to OVN.  (OVN already has to override it to allow a packet from one
+nested container to be forwarded to a different nested container.)
+OVS make it possible to disable this feature of OpenFlow by setting
+the OpenFlow input port field to 0.  (If one does this too early, of
+course, it means that there's no way to actually match on the input
+port in the OpenFlow flow tables, but one can work around that by
+instead setting the input port just before the output action, possibly
+wrapping these actions in push/pop pairs to preserve the input port
+for later.)
+
+The second is the OVN logical ingress port, which is implemented in
+ovn-controller as part of the logical abstraction, using an OVS
+register.  Dropping packets directed to the logical ingress port is
+implemented through an OpenFlow table not directly visible to the
+logical flow table.  Currently this behavior can't be disabled, but
+various ways to ensure it could be implemented, e.g. the same as for
+OpenFlow by allowing the logical inport to be zeroed, or by
+introducing a new action that ignores the inport.
+
+** ovn-northd
+
+*** What flows should it generate?
+
+See description in ovn-northd(8).
+
+** New OVN logical actions
+
+*** enhanced "next" action.
+
+OVN logical router flows need to be able to revisit a single logical
+flow table, so that ICMP "destination unreachable" errors generated by
+a logical router can themselves be routed.  One way to do this is to
+enhance the "next" action to take an optional flow table index.
+
+*** arp
+
+Generates an ARP packet based on the current IPv4 packet and allows it
+to be processed as part of the current pipeline (and then pop back to
+processing the original IPv4 packet).
+
+TCP/IP stacks typically limit the rate at which ARPs are sent, e.g. to
+one per second for a given target.  We might need to do this too.
+
+We probably need to buffer the packet that generated the ARP.  I don't
+know where to do that.
+
+*** icmp4 { action... }
+
+Generates an ICMPv4 packet based on the current IPv4 packet and
+processes it according to each nested action (and then pops back to
+processing the original IPv4 packet).  The intended use case is for
+generating "time exceeded" and "destination unreachable" errors.
+
+ovn-sb.xml includes a tentative specification for this action.
+
+Tentatively, the icmp4 action sets a default icmp_type and icmp_code
+and lets the nested actions override it.  This means that we'd have to
+make icmp_type and icmp_code writable.  Because changing icmp_type and
+icmp_code can change the interpretation of the rest of the data in the
+ICMP packet, we would want to think this through carefully.  If it
+seems like a bad idea then we could instead make the type and code a
+parameter to the action: icmp4(type, code) { action... }
+
+It is worth considering what should be considered the ingress port for
+the ICMPv4 packet.  It's quite likely that the ICMPv4 packet is going
+to go back out the ingress port.  Maybe the icmp4 action, therefore,
+should clear the inport, so that output to the original inport won't
+be discarded.
+
+*** tcp_reset
+
+Transforms the current TCP packet into a RST reply.
+
+ovn-sb.xml includes a tentative specification for this action.
+
+*** Other actions for IPv6.
+
+IPv6 will probably need an action or actions for ND that is similar to
+the "arp" action, and an action for generating
+
+*** Other actions.
+
+Possibly we'll need to implement "field1 = field2;" for copying
+between fields and "field1 <-> field2;" for swapping fields.
+
+*** ovn-controller translation to OpenFlow
+
+The following two translation strategies come to mind.  Some of the
+new actions we might want to implement one way, some of them the
+other, depending on the details.
+
+*** Implementation strategies
+
+One way to do this is to define new actions as Open vSwitch extensions
+to OpenFlow, emit those actions in ovn-controller, and implement them
+in ovs-vswitchd (possibly pushing the implementations into the Linux
+and DPDK datapaths as well).  This is the only acceptable way for
+actions that need high performance.  None of these actions obviously
+need high performance, but it might be necessary to have fairness in
+handling e.g. a flood of incoming packets that require these actions.
+The main disadvantage of this approach is that it ties ovs-vswitchd
+(and the Linux kernel module) to supporting these actions essentially
+forever, which means that we'd want to make sure that they are
+general-purpose, well designed, maintainable, and supportable.
+
+The other way to do this is to send the packets across an OpenFlow
+channel to ovn-controller and have ovn-controller process them.  This
+is acceptable for actions that don't need high performance, and it
+means that we don't add anything permanently to ovs-vswitchd or the
+kernel (so we can be more casual about the design).  The big
+disadvantage is that it becomes necessary to add a way to resume the
+OpenFlow pipeline when it is interrupted in the middle by sending a
+packet to the controller.  This is not as simple as doing a new flow
+table lookup and resuming from that point.  Instead, it is equivalent
+to the (very complicated) recirculation logic in ofproto-dpif-xlate.c.
+Much of this logic can be translated into OpenFlow actions (e.g. the
+call stack and data stack), but some of it is entirely outside
+OpenFlow (e.g. the state of mirrors).  To implement it properly, it
+seems that we'll have to introduce a new Open vSwitch extension to
+OpenFlow, a "send-to-controller" action that causes extra data to be
+sent to the controller, where the extra data packages up the state
+necessary to resume the pipeline.  Maybe the bits of the state that
+can be represented in OpenFlow can be embedded in this extra data in a
+controller-readable form, but other bits we might want to be opaque.
+It's also likely that we'll want to change and extend the form of this
+opaque data over time, so this should be allowed for, e.g. by
+including a nonce in the extra data that is newly generated every time
+ovs-vswitchd starts.
+
+*** OpenFlow action definitions
+
+Define OpenFlow wire structures for each new OpenFlow action and
+implement them in lib/ofp-actions.[ch].
+
+*** OVS implementation
+
+Add code for action translation.  Possibly add datapath code for
+action implementation.  However, none of these new actions should
+require high-bandwidth processing so we could at least start with them
+implemented in userspace only.  (ARP field modification is already
+userspace-only and no one has complained yet.)
+
+** IPv6
+
+*** ND versus ARP
+
+*** IPv6 routing
+
+*** ICMPv6
+
+** IP to MAC binding
+
+Somehow it has to be possible for an L3 logical router to map from an
+IP address to an Ethernet address.  This can happen statically or
+dynamically.  Probably both cases need to be supported eventually.
+
+*** Static IP to MAC binding
+
+Commonly, for a VM, the binding of an IP address to a MAC is known
+statically.  The Logical_Port table in the OVN_Northbound schema can
+be revised to make these bindings known.  Then ovn-northd can
+integrate the bindings into the logical router flow table.
+(ovn-northd can also integrate them into the logical switch flow table
+to terminate ARP requests from VIFs.)
+
+*** Dynamic IP to MAC bindings
+
+Some bindings from IP address to MAC will undoubtedly need to be
+discovered dynamically through ARP requests.  It's straightforward
+enough for a logical L3 router to generate ARP requests and forward
+them to the appropriate switch.
+
+It's more difficult to figure out where the reply should be processed
+and stored.  It might seem at first that a first-cut implementation
+could just keep track of the binding on the hypervisor that needs to
+know, but that can't happen easily because the VM that sends the reply
+might not be on the same HV as the VM that needs the answer (that is,
+the VM that sent the packet that needs the binding to be resolved) and
+there isn't an easy way for it to know which HV needs the answer.
+
+Thus, the HV that processes the ARP reply (which is unknown when the
+ARP is sent) has to tell all the HVs the binding.  The most obvious
+place for this in the OVN_Southbound database.
+
+Details need to be worked out, including:
+
+**** OVN_Southbound schema changes.
+
+Possibly bindings could be added to the Port_Binding table by adding
+or modifying columns.  Another possibility is that another table
+should be added.
+
+**** Logical_Flow representation
+
+It would be really nice to maintain the general-purpose nature of
+logical flows, but these bindings might have to include some
+hard-coded special cases, especially when it comes to the relationship
+with populating the bindings into the OVN_Southbound table.
+
+**** Tracking queries
+
+It's probably best to only record in the database responses to queries
+actually issued by an L3 logical router, so somehow they have to be
+tracked, probably by putting a tentative binding without a MAC address
+into the database.
+
+**** Renewal and expiration.
+
+Something needs to make sure that bindings remain valid and expire
+those that become stale.
+
+*** MTU handling (fragmentation on output)
+
+** Ratelimiting.
+
+*** ARP.
+
+*** ICMP error generation, TCP reset, UDP unreachable, protocol unreachable, 
...
+
+As a point of comparison, Linux doesn't ratelimit TCP resets but I
+think it does everything else.
+
 * ovn-controller
 
 ** ovn-controller parameters and configuration.
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 1655958..a5abc3b 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -106,10 +106,12 @@
       One of the main purposes of <code>ovn-northd</code> is to populate the
       <code>Logical_Flow</code> table in the <code>OVN_Southbound</code>
       database.  This section describes how <code>ovn-northd</code> does this
-      for logical datapaths.
+      for switch and router logical datapaths.
     </p>
 
-    <h2>Ingress Table 0: Admission Control and Ingress Port Security</h2>
+    <h2>Logical Switch Datapaths</h2>
+
+    <h3>Ingress Table 0: Admission Control and Ingress Port Security</h3>
 
     <p>
       Ingress table 0 contains these logical flows:
@@ -137,7 +139,7 @@
       be dropped.
     </p>
 
-    <h2>Ingress table 1: <code>from-lport</code> ACLs</h2>
+    <h3>Ingress table 1: <code>from-lport</code> ACLs</h3>
 
     <p>
       Logical flows in this table closely reproduce those in the
@@ -154,7 +156,7 @@
       <code>next;</code>, so that ACLs allow packets by default.
     </p>
 
-    <h2>Ingress Table 2: Destination Lookup</h2>
+    <h3>Ingress Table 2: Destination Lookup</h3>
 
     <p>
       This table implements switching behavior.  It contains these logical
@@ -185,13 +187,13 @@
       </li>
     </ul>
 
-    <h2>Egress Table 0: <code>to-lport</code> ACLs</h2>
+    <h3>Egress Table 0: <code>to-lport</code> ACLs</h3>
 
     <p>
       This is similar to ingress table 1 except for <code>to-lport</code> ACLs.
     </p>
 
-    <h2>Egress Table 1: Egress Port Security</h2>
+    <h3>Egress Table 1: Egress Port Security</h3>
 
     <p>
       This is similar to the ingress port security logic in ingress table 0,
@@ -206,4 +208,359 @@
       disabled logical <code>outport</code> overrides the priority-100 flow
       with a <code>drop;</code> action.
     </p>
+
+    <h2>Logical Router Datapaths</h2>
+
+    <h3>Ingress Table 0: L2 Admission Control</h3>
+
+    <p>
+      This table drops packets that the router shouldn't see at all based on
+      their Ethernet headers.  It contains the following flows, all with
+      priority 100:
+    </p>
+
+    <ul>
+      <li>
+        One flow that matches on <code>eth.dst[40] == 1</code> with action
+        <code>next;</code>.
+      </li>
+
+      <li>
+        For each router port <var>P</var> with Ethernet address <var>E</var>, a
+        flow that matches <code>inport == <var>P</var> &amp;&amp; eth.dst ==
+        <var>E</var></code>, with action <code>next;</code>.
+      </li>
+    </ul>
+
+    <p>
+      Other packets are implicitly dropped.
+    </p>
+
+    <h3>Ingress Table 1: IP Input</h3>
+
+    <p>
+      This table is the core of the logical router datapath functionality.  It
+      contains the following flows to implement very basic IP host
+      functionality.
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          L3 admission control: A priority-220 flow drops packets that match
+          any of the following:
+        </p>
+
+        <ul>
+          <li>
+            <code>ip4.src[28..31] == 0xe</code> (multicast source)
+          </li>
+          <li>
+            <code>ip4.src == 255.255.255.255</code> (broadcast source)
+          </li>
+          <li>
+            <code>ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8</code>
+            (localhost source or destination)
+          </li>
+          <li>
+            <code>ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8</code> (zero
+            network source or destination)
+          </li>
+          <li>
+            <code>ip4.src</code> is any IP address owned by the router.
+          </li>
+          <li>
+            <code>ip4.src</code> is the broadcast address of any IP network
+            known to the router.
+          </li>
+        </ul>
+      </li>
+
+      <li>
+        <p>
+          ICMP echo reply.  These flows reply to ICMP echo requests received
+          for the router's IP address.  Let <var>A</var> be an IP address owned
+          by the router or the broadcast address for one of these IP address's
+          networks.  Then, for each <var>A</var>, a priority-210 flow matches
+          on <code>ip4.dst == <var>A</var></code> and <code>icmp4.type == 8
+          &amp;&amp; icmp4.code == 0</code> (ICMP echo request).  These flows
+          use the following actions where, if <var>A</var> is unicast, then
+          <var>S</var> is <var>A</var>, and if <var>A</var> is broadcast,
+          <var>S</var> is the router's IP address in <var>A</var>'s network:
+        </p>
+
+        <pre>
+ip4.dst = ip4.src;
+ip4.src = <var>S</var>;
+ip4.ttl = 255;
+icmp4.type = 0;
+next;
+        </pre>
+
+        <p>
+          Similar flows match on <code>ip4.dst == 255.255.255.255</code> and
+          each individual <code>inport</code>, and use the same actions in
+          which <var>S</var> is a function of <code>inport</code>.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          ARP reply.  These flows reply to ARP requests for the router's own IP
+          address.  For each router port <var>P</var> that owns IP address
+          <var>A</var> and Ethernet address <var>E</var>, a priority-210 flow
+          matches <code>inport == <var>P</var> &amp;&amp; arp.tpa ==
+          <var>A</var> &amp;&amp; arp.op == 1</code> (ARP request) with the
+          following actions:
+        </p>
+
+        <pre>
+eth.dst = eth.src;
+eth.src = <var>E</var>;
+arp.op = 2; // ARP reply
+arp.tha = arp.sha;
+arp.sha = <var>E</var>;
+arp.tpa = arp.spa;
+arp.spa = <var>A</var>;
+outport = <var>P</var>;
+inport = 0; // allow sending out inport
+output;
+        </pre>
+      </li>
+
+      <li>
+        <p>
+          UDP port unreachable.  These flows generate ICMP port unreachable
+          messages in reply to UDP datagrams directed to the router's IP
+          address.  The logical router doesn't accept any UDP traffic so it
+          always generates such a reply.
+        </p>
+
+        <p>
+          These flows should not match IP fragments with nonzero offset.
+        </p>
+
+        <p>
+          Details TBD.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          TCP reset.  These flows generate TCP reset messages in reply to TCP
+          datagrams directed to the router's IP address.  The logical router
+          doesn't accept any TCP traffic so it always generates such a reply.
+        </p>
+
+        <p>
+          These flows should not match IP fragments with nonzero offset.
+        </p>
+
+        <p>
+          Details TBD.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          Protocol unreachable.  These flows generate ICMP protocol unreachable
+          messages in reply to packets directed to the router's IP address on
+          IP protocols other than UDP, TCP, and ICMP.
+        </p>
+
+        <p>
+          These flows should not match IP fragments with nonzero offset.
+        </p>
+
+        <p>
+          Details TBD.
+        </p>
+      </li>
+
+      <li>
+        Drop other IP traffic to this router.  These flows drop any other
+        traffic destined to an IP address of this router that is not already
+        handled by one of the flows above.  For each IP address <var>A</var>
+        owned by the router, a priority-200 flow matches <code>ip4.dst ==
+        <var>A</var></code> and drops the traffic.
+      </li>
+    </ul>
+
+    <p>
+      The flows above handle all of the traffic that might be directed to the
+      router itself.  The following flows (with lower priorities) handle the
+      remaining traffic, potentially for forwarding:
+    </p>
+
+    <ul>
+      <li>
+        Ethernet local broadcast.  A priority-190 flow with match <code>eth.dst
+        == ff:ff:ff:ff:ff:ff</code> drops traffic destined to the local
+        Ethernet broadcast address.  By definition this traffic should not be
+        forwarded.
+      </li>
+
+      <li>
+        Drop IP multicast.  A priority-190 flow with match
+        <code>ip4.dst[28..31] == 0xe</code> drops IP multicast traffic.
+      </li>
+
+      <li>
+        <p>
+          TTL check.  For each router port <var>P</var>, whose IP address is
+          <var>A</var>, a priority-180 flow with match <code>inport ==
+          <var>P</var> &amp;&amp; ip4.ttl &lt; 2 &amp;&amp;
+          !ip.later_frag</code> matches packets whose TTL has expired, with the
+          following actions to send an ICMP time exceeded reply:
+        </p>
+
+        <pre>
+icmp4 {
+    icmp4.type = 11; // Time exceeded
+    icmp4.code = 0;  // TTL exceeded in transit
+    ip4.dst = ip4.src;
+    ip4.src = <var>A</var>;
+    ip4.ttl = 255;
+    next;
+};
+        </pre>
+      </li>
+    </ul>
+
+    <h3>Ingress Table 2: IP Routing</h3>
+
+    <p>
+      A packet that arrives at this table is an IP packet that should be routed
+      to the address in <code>ip4.dst</code>.  This table implements IP
+      routing, setting <code>reg0</code> to the next-hop IP address (leaving
+      <code>ip4.dst</code>, the packet's final destination, unchanged) and
+      advances to the next table for ARP resolution.
+    </p>
+
+    <p>
+      This table contains the following logical flows:
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          Routing table.  For each route to IPv4 network <var>N</var> with
+          netmask <var>M</var>, a logical flow with match <code>ip4.dst ==
+          <var>N</var>/<var>M</var></code>, whose priority is the number of
+          1-bits in <var>M</var>, has the following actions:
+        </p>
+
+        <pre>
+ip4.ttl--;
+reg0 = <var>G</var>;
+next;
+        </pre>
+
+        <p>
+          (Ingress table 1 already verified that <code>ip4.ttl--;</code> will
+          not yield a TTL exceeded error.)
+        </p>
+
+        <p>
+          If the route has a gateway, <var>G</var> is the gateway IP address,
+          otherwise it is <code>ip4.dst</code>.
+        </p>
+      </li>
+
+      <li>
+        <p>
+          Destination unreachable.  For each router port <var>P</var>, which
+          owns IP address <var>A</var>, a priority-0 logical flow with match
+          <code>in_port == <var>P</var> &amp;&amp; !ip.later_frag &amp;&amp;
+          !icmp</code> has the following actions:
+        </p>
+
+        <pre>
+icmp4 {
+    icmp4.type = 3; // Destination unreachable
+    icmp4.code = 0; // Network unreachable
+    ip4.dst = ip4.src;
+    ip4.src = <var>A</var>;
+    ip4.ttl = 255;
+    next(2);
+};
+        </pre>
+
+        <p>
+          (The <code>!icmp</code> check prevents recursion if the destination
+          unreachable message itself cannot be routed.)
+        </p>
+
+        <p>
+          These flows are omitted if the logical router has a default route,
+          that is, a route with netmask 0.0.0.0.
+        </p>
+      </li>
+    </ul>
+
+    <h3>Ingress Table 3: ARP Resolution</h3>
+
+    <p>
+      Any packet that reaches this table is an IP packet whose next-hop IP
+      address is in <code>reg0</code>.  (<code>ip4.dst</code> is the final
+      destination.)  This table resolves the IP address in <code>reg0</code>
+      into an Ethernet address in <code>eth.dst</code>, using the following
+      flows:
+    </p>
+
+    <ul>
+      <li>
+        <p>
+          Known MAC bindings.  For each IP address <var>A</var> whose host is
+          known to have Ethernet address <var>E</var> and reside on router port
+          <var>P</var>, a priority-200 flow with match <code>reg0 ==
+          <var>A</var></code> has the following actions:
+        </p>
+
+        <pre>
+eth.dst = <var>E</var>;
+outport = <var>P</var>;
+output;
+        </pre>
+      </li>
+
+      <li>
+        <p>
+          Unknown MAC bindings.  For each non-gateway route to IPv4 network
+          <var>N</var> with netmask <var>M</var> on router port <var>P</var>
+          that owns IP address <var>A</var> and Ethernet address <var>E</var>,
+          a logical flow with match <code>ip4.dst ==
+          <var>N</var>/<var>M</var></code>, whose priority is the number of
+          1-bits in <var>M</var>, has the following actions:
+        </p>
+
+        <pre>
+arp {
+    eth.dst = ff:ff:ff:ff:ff:ff;
+    eth.src = <var>E</var>;
+    arp.sha = <var>E</var>;
+    arp.tha = 00:00:00:00:00:00;
+    arp.spa = <var>A</var>;
+    arp.tpa = ip4.dst;
+    arp.op = 1;  // ARP request
+    outport = <var>P</var>;
+    output;
+};
+        </pre>
+
+        <p>
+          TBD: How to install MAC bindings when an ARP response comes back.
+          (Implement a "learn" action?)
+        </p>
+      </li>
+    </ul>
+
+    <h3>Egress Table 0: ARP Details</h3>
+
+    <p>
+      Packets that reach this table are ready for delivery.  It contains a
+      single priority-0 logical flow that matches all packets and actions
+      <code>output;</code>.
+    </p>
+
 </manpage>
diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index 47dfc2a..a7ff674 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -596,7 +596,7 @@
     </li>
   </ol>
 
-  <h2>Life Cycle of a Packet</h2>
+  <h2>Architectural Life Cycle of a Packet</h2>
 
   <p>
     This section describes how a packet travels from one virtual machine or
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index d1b37f0..7b6e4ef 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -240,12 +240,12 @@
       The default action when no flow matches is to drop packets.
     </p>
 
-    <p><em>Logical Life Cycle of a Packet</em></p>
+    <p><em>Architectural Logical Life Cycle of a Packet</em></p>
 
     <p>
       This following description focuses on the life cycle of a packet through
       a logical datapath, ignoring physical details of the implementation.
-      Please refer to <em>Life Cycle of a Packet</em> in
+      Please refer to <em>Architectural Life Cycle of a Packet</em> in
       <code>ovn-architecture</code>(7) for the physical information.
     </p>
 
@@ -810,24 +810,111 @@
       <dl>
         <dt><code><var>field1</var> = <var>field2</var>;</code></dt>
         <dd>
-          Extends the assignment action to allow copying between fields.
+          <p>
+            Extends the assignment action to allow copying between fields.
+          </p>
+
+          <p>
+            An assignment adds prerequisites from the source and the
+            destination fields.
+          </p>
+        </dd>
+
+        <dt><code>ip.ttl--;</code></dt>
+        <dd>
+          <p>
+            Decrements the IPv4 or IPv6 TTL.  If this would make the TTL zero
+            or negative, then processing of the packet halts; no further
+            actions are processed.  (To properly handle such cases, a
+            higher-priority flow should match on <code>ip.ttl &lt; 2</code>.)
+          </p>
+
+          <p><b>Prerequisite:</b> <code>ip</code></p>
         </dd>
 
-        <dt><code>learn</code></dt>
+        <dt><code>arp { <var>action</var>; </code>...<code> };</code></dt>
+        <dd>
+          <p>
+            Temporarily replaces the IPv4 packet being processed by an ARP
+            packet and executes each nested <var>action</var> on the ARP
+            packet.  Actions following the <var>arp</var> action, if any, apply
+            to the original, unmodified packet.
+          </p>
 
-        <dt><code>conntrack</code></dt>
+          <p>
+            The ARP packet that this action operates on is initialized based on
+            the IPv4 packet being processed, as follows.  These are default
+            values that the nested actions will probably want to change:
+          </p>
+
+          <ul>
+            <li><code>eth.src</code> unchanged</li>
+            <li><code>eth.dst</code> unchanged</li>
+            <li><code>eth.type = 0x0806</code></li>
+            <li><code>arp.op = 1</code> (ARP request)</li>
+            <li><code>arp.sha</code> copied from <code>eth.src</code></li>
+            <li><code>arp.spa</code> copied from <code>ip4.src</code></li>
+            <li><code>arp.tha = 00:00:00:00:00:00</code></li>
+            <li><code>arp.tpa</code> copied from <code>ip4.dst</code></li>
+          </ul>
+
+          <p><b>Prerequisite:</b> <code>ip4</code></p>
+        </dd>
 
-        <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { 
<var>action</var>; </code>...<code>};</code></dt>
+        <dt><code>icmp4 { <var>action</var>; </code>...<code> };</code></dt>
         <dd>
-          decrement TTL; execute first set of actions if
-          successful, second set if TTL decrement fails
+          <p>
+            Temporarily replaces the IPv4 packet being processed by an ICMPv4
+            packet and executes each nested <var>action</var> on the ICMPv4
+            packet.  Actions following the <var>icmp4</var> action, if any,
+            apply to the original, unmodified packet.
+          </p>
+
+          <p>
+            The ICMPv4 packet that this action operates on is initialized based
+            on the IPv4 packet being processed, as follows.  These are default
+            values that the nested actions will probably want to change.
+            Ethernet and IPv4 fields not listed here are not changed:
+          </p>
+
+          <ul>
+            <li><code>ip.proto = 1</code> (ICMPv4)</li>
+            <li><code>ip.frag = 0</code> (not a fragment)</li>
+            <li><code>icmp4.type = 3</code> (destination unreachable)</li>
+            <li><code>icmp4.code = 1</code> (host unreachable)</li>
+          </ul>
+
+          <p>
+            XXX need to explain exactly how the ICMP packet is constructed
+          </p>
+
+          <p><b>Prerequisite:</b> <code>ip4</code></p>
         </dd>
 
-        <dt><code>icmp_reply { <var>action</var>, </code>...<code> 
};</code></dt>
-        <dd>generate ICMP reply from packet, execute <var>action</var>s</dd>
+        <dt><code>tcp_reset;</code></dt>
+        <dd>
+          <p>
+            This action transforms the current TCP packet according to the
+            following pseudocode:
+          </p>
+
+          <pre>
+if (tcp.ack) {
+        tcp.seq = tcp.ack;
+} else {
+        tcp.ack = tcp.seq + length(tcp.payload);
+        tcp.seq = 0;
+}
+tcp.flags = RST;
+</pre>
 
-        <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt>
-        <dd>generate ARP from packet, execute <var>action</var>s</dd>
+          <p>
+            Then, the action drops all TCP options and payload data, and
+            updates the TCP checksum.
+          </p>
+
+          <p><b>Prerequisite:</b> <code>tcp</code></p>
+        </dd>
       </dl>
     </column>
 
-- 
2.1.3

_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH v2] ovn: Update TODO, ovn-northd flow table design, ovn-architecture for L3.

Reply via email to