This is a proposed plan for logical L3 in OVN. It is not entirely complete but it includes many important details and I believe that it moves planning forward.
Signed-off-by: Ben Pfaff <b...@nicira.com> --- v1->v2: Resolve comments from Justin and Russell posted to ovs-dev. ovn/TODO | 281 +++++++++++++++++++++++++++++++++ ovn/northd/ovn-northd.8.xml | 369 +++++++++++++++++++++++++++++++++++++++++++- ovn/ovn-architecture.7.xml | 2 +- ovn/ovn-sb.xml | 111 +++++++++++-- 4 files changed, 744 insertions(+), 19 deletions(-) diff --git a/ovn/TODO b/ovn/TODO index a48251f..51c3043 100644 --- a/ovn/TODO +++ b/ovn/TODO @@ -1,3 +1,284 @@ +-*- outline -*- + +* L3 support + +** OVN_Northbound schema + +*** Needs to support interconnected routers + +It should be possible to connect one router to another, e.g. to +represent a provider/tenant router relationship. This requires +an OVN_Northbound schema change. + +*** Needs to support extra routes + +Currently a router port has a single route associated with it, but +presumably we should support multiple routes. For connections from +one router to another, this doesn't seem to matter (just put more than +one connection between them), but for connections between a router and +a switch it might matter because a switch has only one router port. + +** OVN_SB schema + +*** Logical datapath interconnection + +There needs to be a way in the OVN_Southbound database to express +connections between logical datapaths, so that packets can pass from a +logical switch to its logical router (and vice versa) and from one +logical router to another. + +One way to do this would be to introduce logical patch ports, closely +modeled on the "physical" patch ports that OVS has had for ages. Each +logical patch port would consist of two rows in the Port_Binding table +(one in each logical datapath), with type "patch" and an option "peer" +that names the other logical port in the pair. + +If we do it this way then we'll need to figure out one odd special +case. Currently the ACL table documents that the logical router port +is always named "ROUTER". This can't be done directly with this patch +port technique, because every row in the Logical_Port table must have +a unique name. This probably means that we should change the +convention for the ACL table so that the logical router port name is +unique; for example, we could change the Logical_Router_Port table to +require the 'name' column to be unique, and then use that name in the +ACL table. + +*** Allow output to ingress port + +Sometimes when a packet ingresses into a router, it has to egress the +same port. One example is a "one-armed" router that has multiple +routes on a single port (or in which a host is (mis)configured to send +every IP packet to the router, e.g. due to a bad netmask). Another is +when a router needs to send an ICMP reply to a ingressing packet. + +To some degree this problem is layered, because there are two +different notions of "ingress port". The first is the OpenFlow +ingress port, essentially a physical port identifier. This is +implemented as part of ovs-vswitchd's OpenFlow implementation. It +prevents a reply from being sent across the tunnel on which it +arrived. It is questionable whether this OpenFlow feature is useful +to OVN. (OVN already has to override it to allow a packet from one +nested container to be forwarded to a different nested container.) +OVS make it possible to disable this feature of OpenFlow by setting +the OpenFlow input port field to 0. (If one does this too early, of +course, it means that there's no way to actually match on the input +port in the OpenFlow flow tables, but one can work around that by +instead setting the input port just before the output action, possibly +wrapping these actions in push/pop pairs to preserve the input port +for later.) + +The second is the OVN logical ingress port, which is implemented in +ovn-controller as part of the logical abstraction, using an OVS +register. Dropping packets directed to the logical ingress port is +implemented through an OpenFlow table not directly visible to the +logical flow table. Currently this behavior can't be disabled, but +various ways to ensure it could be implemented, e.g. the same as for +OpenFlow by allowing the logical inport to be zeroed, or by +introducing a new action that ignores the inport. + +** ovn-northd + +*** What flows should it generate? + +See description in ovn-northd(8). + +** New OVN logical actions + +*** enhanced "next" action. + +OVN logical router flows need to be able to revisit a single logical +flow table, so that ICMP "destination unreachable" errors generated by +a logical router can themselves be routed. One way to do this is to +enhance the "next" action to take an optional flow table index. + +*** arp + +Generates an ARP packet based on the current IPv4 packet and allows it +to be processed as part of the current pipeline (and then pop back to +processing the original IPv4 packet). + +TCP/IP stacks typically limit the rate at which ARPs are sent, e.g. to +one per second for a given target. We might need to do this too. + +We probably need to buffer the packet that generated the ARP. I don't +know where to do that. + +*** icmp4 { action... } + +Generates an ICMPv4 packet based on the current IPv4 packet and +processes it according to each nested action (and then pops back to +processing the original IPv4 packet). The intended use case is for +generating "time exceeded" and "destination unreachable" errors. + +ovn-sb.xml includes a tentative specification for this action. + +Tentatively, the icmp4 action sets a default icmp_type and icmp_code +and lets the nested actions override it. This means that we'd have to +make icmp_type and icmp_code writable. Because changing icmp_type and +icmp_code can change the interpretation of the rest of the data in the +ICMP packet, we would want to think this through carefully. If it +seems like a bad idea then we could instead make the type and code a +parameter to the action: icmp4(type, code) { action... } + +It is worth considering what should be considered the ingress port for +the ICMPv4 packet. It's quite likely that the ICMPv4 packet is going +to go back out the ingress port. Maybe the icmp4 action, therefore, +should clear the inport, so that output to the original inport won't +be discarded. + +*** tcp_reset + +Transforms the current TCP packet into a RST reply. + +ovn-sb.xml includes a tentative specification for this action. + +*** Other actions for IPv6. + +IPv6 will probably need an action or actions for ND that is similar to +the "arp" action, and an action for generating + +*** Other actions. + +Possibly we'll need to implement "field1 = field2;" for copying +between fields and "field1 <-> field2;" for swapping fields. + +*** ovn-controller translation to OpenFlow + +The following two translation strategies come to mind. Some of the +new actions we might want to implement one way, some of them the +other, depending on the details. + +*** Implementation strategies + +One way to do this is to define new actions as Open vSwitch extensions +to OpenFlow, emit those actions in ovn-controller, and implement them +in ovs-vswitchd (possibly pushing the implementations into the Linux +and DPDK datapaths as well). This is the only acceptable way for +actions that need high performance. None of these actions obviously +need high performance, but it might be necessary to have fairness in +handling e.g. a flood of incoming packets that require these actions. +The main disadvantage of this approach is that it ties ovs-vswitchd +(and the Linux kernel module) to supporting these actions essentially +forever, which means that we'd want to make sure that they are +general-purpose, well designed, maintainable, and supportable. + +The other way to do this is to send the packets across an OpenFlow +channel to ovn-controller and have ovn-controller process them. This +is acceptable for actions that don't need high performance, and it +means that we don't add anything permanently to ovs-vswitchd or the +kernel (so we can be more casual about the design). The big +disadvantage is that it becomes necessary to add a way to resume the +OpenFlow pipeline when it is interrupted in the middle by sending a +packet to the controller. This is not as simple as doing a new flow +table lookup and resuming from that point. Instead, it is equivalent +to the (very complicated) recirculation logic in ofproto-dpif-xlate.c. +Much of this logic can be translated into OpenFlow actions (e.g. the +call stack and data stack), but some of it is entirely outside +OpenFlow (e.g. the state of mirrors). To implement it properly, it +seems that we'll have to introduce a new Open vSwitch extension to +OpenFlow, a "send-to-controller" action that causes extra data to be +sent to the controller, where the extra data packages up the state +necessary to resume the pipeline. Maybe the bits of the state that +can be represented in OpenFlow can be embedded in this extra data in a +controller-readable form, but other bits we might want to be opaque. +It's also likely that we'll want to change and extend the form of this +opaque data over time, so this should be allowed for, e.g. by +including a nonce in the extra data that is newly generated every time +ovs-vswitchd starts. + +*** OpenFlow action definitions + +Define OpenFlow wire structures for each new OpenFlow action and +implement them in lib/ofp-actions.[ch]. + +*** OVS implementation + +Add code for action translation. Possibly add datapath code for +action implementation. However, none of these new actions should +require high-bandwidth processing so we could at least start with them +implemented in userspace only. (ARP field modification is already +userspace-only and no one has complained yet.) + +** IPv6 + +*** ND versus ARP + +*** IPv6 routing + +*** ICMPv6 + +** IP to MAC binding + +Somehow it has to be possible for an L3 logical router to map from an +IP address to an Ethernet address. This can happen statically or +dynamically. Probably both cases need to be supported eventually. + +*** Static IP to MAC binding + +Commonly, for a VM, the binding of an IP address to a MAC is known +statically. The Logical_Port table in the OVN_Northbound schema can +be revised to make these bindings known. Then ovn-northd can +integrate the bindings into the logical router flow table. +(ovn-northd can also integrate them into the logical switch flow table +to terminate ARP requests from VIFs.) + +*** Dynamic IP to MAC bindings + +Some bindings from IP address to MAC will undoubtedly need to be +discovered dynamically through ARP requests. It's straightforward +enough for a logical L3 router to generate ARP requests and forward +them to the appropriate switch. + +It's more difficult to figure out where the reply should be processed +and stored. It might seem at first that a first-cut implementation +could just keep track of the binding on the hypervisor that needs to +know, but that can't happen easily because the VM that sends the reply +might not be on the same HV as the VM that needs the answer (that is, +the VM that sent the packet that needs the binding to be resolved) and +there isn't an easy way for it to know which HV needs the answer. + +Thus, the HV that processes the ARP reply (which is unknown when the +ARP is sent) has to tell all the HVs the binding. The most obvious +place for this in the OVN_Southbound database. + +Details need to be worked out, including: + +**** OVN_Southbound schema changes. + +Possibly bindings could be added to the Port_Binding table by adding +or modifying columns. Another possibility is that another table +should be added. + +**** Logical_Flow representation + +It would be really nice to maintain the general-purpose nature of +logical flows, but these bindings might have to include some +hard-coded special cases, especially when it comes to the relationship +with populating the bindings into the OVN_Southbound table. + +**** Tracking queries + +It's probably best to only record in the database responses to queries +actually issued by an L3 logical router, so somehow they have to be +tracked, probably by putting a tentative binding without a MAC address +into the database. + +**** Renewal and expiration. + +Something needs to make sure that bindings remain valid and expire +those that become stale. + +*** MTU handling (fragmentation on output) + +** Ratelimiting. + +*** ARP. + +*** ICMP error generation, TCP reset, UDP unreachable, protocol unreachable, ... + +As a point of comparison, Linux doesn't ratelimit TCP resets but I +think it does everything else. + * ovn-controller ** ovn-controller parameters and configuration. diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index 1655958..a5abc3b 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -106,10 +106,12 @@ One of the main purposes of <code>ovn-northd</code> is to populate the <code>Logical_Flow</code> table in the <code>OVN_Southbound</code> database. This section describes how <code>ovn-northd</code> does this - for logical datapaths. + for switch and router logical datapaths. </p> - <h2>Ingress Table 0: Admission Control and Ingress Port Security</h2> + <h2>Logical Switch Datapaths</h2> + + <h3>Ingress Table 0: Admission Control and Ingress Port Security</h3> <p> Ingress table 0 contains these logical flows: @@ -137,7 +139,7 @@ be dropped. </p> - <h2>Ingress table 1: <code>from-lport</code> ACLs</h2> + <h3>Ingress table 1: <code>from-lport</code> ACLs</h3> <p> Logical flows in this table closely reproduce those in the @@ -154,7 +156,7 @@ <code>next;</code>, so that ACLs allow packets by default. </p> - <h2>Ingress Table 2: Destination Lookup</h2> + <h3>Ingress Table 2: Destination Lookup</h3> <p> This table implements switching behavior. It contains these logical @@ -185,13 +187,13 @@ </li> </ul> - <h2>Egress Table 0: <code>to-lport</code> ACLs</h2> + <h3>Egress Table 0: <code>to-lport</code> ACLs</h3> <p> This is similar to ingress table 1 except for <code>to-lport</code> ACLs. </p> - <h2>Egress Table 1: Egress Port Security</h2> + <h3>Egress Table 1: Egress Port Security</h3> <p> This is similar to the ingress port security logic in ingress table 0, @@ -206,4 +208,359 @@ disabled logical <code>outport</code> overrides the priority-100 flow with a <code>drop;</code> action. </p> + + <h2>Logical Router Datapaths</h2> + + <h3>Ingress Table 0: L2 Admission Control</h3> + + <p> + This table drops packets that the router shouldn't see at all based on + their Ethernet headers. It contains the following flows, all with + priority 100: + </p> + + <ul> + <li> + One flow that matches on <code>eth.dst[40] == 1</code> with action + <code>next;</code>. + </li> + + <li> + For each router port <var>P</var> with Ethernet address <var>E</var>, a + flow that matches <code>inport == <var>P</var> && eth.dst == + <var>E</var></code>, with action <code>next;</code>. + </li> + </ul> + + <p> + Other packets are implicitly dropped. + </p> + + <h3>Ingress Table 1: IP Input</h3> + + <p> + This table is the core of the logical router datapath functionality. It + contains the following flows to implement very basic IP host + functionality. + </p> + + <ul> + <li> + <p> + L3 admission control: A priority-220 flow drops packets that match + any of the following: + </p> + + <ul> + <li> + <code>ip4.src[28..31] == 0xe</code> (multicast source) + </li> + <li> + <code>ip4.src == 255.255.255.255</code> (broadcast source) + </li> + <li> + <code>ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8</code> + (localhost source or destination) + </li> + <li> + <code>ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8</code> (zero + network source or destination) + </li> + <li> + <code>ip4.src</code> is any IP address owned by the router. + </li> + <li> + <code>ip4.src</code> is the broadcast address of any IP network + known to the router. + </li> + </ul> + </li> + + <li> + <p> + ICMP echo reply. These flows reply to ICMP echo requests received + for the router's IP address. Let <var>A</var> be an IP address owned + by the router or the broadcast address for one of these IP address's + networks. Then, for each <var>A</var>, a priority-210 flow matches + on <code>ip4.dst == <var>A</var></code> and <code>icmp4.type == 8 + && icmp4.code == 0</code> (ICMP echo request). These flows + use the following actions where, if <var>A</var> is unicast, then + <var>S</var> is <var>A</var>, and if <var>A</var> is broadcast, + <var>S</var> is the router's IP address in <var>A</var>'s network: + </p> + + <pre> +ip4.dst = ip4.src; +ip4.src = <var>S</var>; +ip4.ttl = 255; +icmp4.type = 0; +next; + </pre> + + <p> + Similar flows match on <code>ip4.dst == 255.255.255.255</code> and + each individual <code>inport</code>, and use the same actions in + which <var>S</var> is a function of <code>inport</code>. + </p> + </li> + + <li> + <p> + ARP reply. These flows reply to ARP requests for the router's own IP + address. For each router port <var>P</var> that owns IP address + <var>A</var> and Ethernet address <var>E</var>, a priority-210 flow + matches <code>inport == <var>P</var> && arp.tpa == + <var>A</var> && arp.op == 1</code> (ARP request) with the + following actions: + </p> + + <pre> +eth.dst = eth.src; +eth.src = <var>E</var>; +arp.op = 2; // ARP reply +arp.tha = arp.sha; +arp.sha = <var>E</var>; +arp.tpa = arp.spa; +arp.spa = <var>A</var>; +outport = <var>P</var>; +inport = 0; // allow sending out inport +output; + </pre> + </li> + + <li> + <p> + UDP port unreachable. These flows generate ICMP port unreachable + messages in reply to UDP datagrams directed to the router's IP + address. The logical router doesn't accept any UDP traffic so it + always generates such a reply. + </p> + + <p> + These flows should not match IP fragments with nonzero offset. + </p> + + <p> + Details TBD. + </p> + </li> + + <li> + <p> + TCP reset. These flows generate TCP reset messages in reply to TCP + datagrams directed to the router's IP address. The logical router + doesn't accept any TCP traffic so it always generates such a reply. + </p> + + <p> + These flows should not match IP fragments with nonzero offset. + </p> + + <p> + Details TBD. + </p> + </li> + + <li> + <p> + Protocol unreachable. These flows generate ICMP protocol unreachable + messages in reply to packets directed to the router's IP address on + IP protocols other than UDP, TCP, and ICMP. + </p> + + <p> + These flows should not match IP fragments with nonzero offset. + </p> + + <p> + Details TBD. + </p> + </li> + + <li> + Drop other IP traffic to this router. These flows drop any other + traffic destined to an IP address of this router that is not already + handled by one of the flows above. For each IP address <var>A</var> + owned by the router, a priority-200 flow matches <code>ip4.dst == + <var>A</var></code> and drops the traffic. + </li> + </ul> + + <p> + The flows above handle all of the traffic that might be directed to the + router itself. The following flows (with lower priorities) handle the + remaining traffic, potentially for forwarding: + </p> + + <ul> + <li> + Ethernet local broadcast. A priority-190 flow with match <code>eth.dst + == ff:ff:ff:ff:ff:ff</code> drops traffic destined to the local + Ethernet broadcast address. By definition this traffic should not be + forwarded. + </li> + + <li> + Drop IP multicast. A priority-190 flow with match + <code>ip4.dst[28..31] == 0xe</code> drops IP multicast traffic. + </li> + + <li> + <p> + TTL check. For each router port <var>P</var>, whose IP address is + <var>A</var>, a priority-180 flow with match <code>inport == + <var>P</var> && ip4.ttl < 2 && + !ip.later_frag</code> matches packets whose TTL has expired, with the + following actions to send an ICMP time exceeded reply: + </p> + + <pre> +icmp4 { + icmp4.type = 11; // Time exceeded + icmp4.code = 0; // TTL exceeded in transit + ip4.dst = ip4.src; + ip4.src = <var>A</var>; + ip4.ttl = 255; + next; +}; + </pre> + </li> + </ul> + + <h3>Ingress Table 2: IP Routing</h3> + + <p> + A packet that arrives at this table is an IP packet that should be routed + to the address in <code>ip4.dst</code>. This table implements IP + routing, setting <code>reg0</code> to the next-hop IP address (leaving + <code>ip4.dst</code>, the packet's final destination, unchanged) and + advances to the next table for ARP resolution. + </p> + + <p> + This table contains the following logical flows: + </p> + + <ul> + <li> + <p> + Routing table. For each route to IPv4 network <var>N</var> with + netmask <var>M</var>, a logical flow with match <code>ip4.dst == + <var>N</var>/<var>M</var></code>, whose priority is the number of + 1-bits in <var>M</var>, has the following actions: + </p> + + <pre> +ip4.ttl--; +reg0 = <var>G</var>; +next; + </pre> + + <p> + (Ingress table 1 already verified that <code>ip4.ttl--;</code> will + not yield a TTL exceeded error.) + </p> + + <p> + If the route has a gateway, <var>G</var> is the gateway IP address, + otherwise it is <code>ip4.dst</code>. + </p> + </li> + + <li> + <p> + Destination unreachable. For each router port <var>P</var>, which + owns IP address <var>A</var>, a priority-0 logical flow with match + <code>in_port == <var>P</var> && !ip.later_frag && + !icmp</code> has the following actions: + </p> + + <pre> +icmp4 { + icmp4.type = 3; // Destination unreachable + icmp4.code = 0; // Network unreachable + ip4.dst = ip4.src; + ip4.src = <var>A</var>; + ip4.ttl = 255; + next(2); +}; + </pre> + + <p> + (The <code>!icmp</code> check prevents recursion if the destination + unreachable message itself cannot be routed.) + </p> + + <p> + These flows are omitted if the logical router has a default route, + that is, a route with netmask 0.0.0.0. + </p> + </li> + </ul> + + <h3>Ingress Table 3: ARP Resolution</h3> + + <p> + Any packet that reaches this table is an IP packet whose next-hop IP + address is in <code>reg0</code>. (<code>ip4.dst</code> is the final + destination.) This table resolves the IP address in <code>reg0</code> + into an Ethernet address in <code>eth.dst</code>, using the following + flows: + </p> + + <ul> + <li> + <p> + Known MAC bindings. For each IP address <var>A</var> whose host is + known to have Ethernet address <var>E</var> and reside on router port + <var>P</var>, a priority-200 flow with match <code>reg0 == + <var>A</var></code> has the following actions: + </p> + + <pre> +eth.dst = <var>E</var>; +outport = <var>P</var>; +output; + </pre> + </li> + + <li> + <p> + Unknown MAC bindings. For each non-gateway route to IPv4 network + <var>N</var> with netmask <var>M</var> on router port <var>P</var> + that owns IP address <var>A</var> and Ethernet address <var>E</var>, + a logical flow with match <code>ip4.dst == + <var>N</var>/<var>M</var></code>, whose priority is the number of + 1-bits in <var>M</var>, has the following actions: + </p> + + <pre> +arp { + eth.dst = ff:ff:ff:ff:ff:ff; + eth.src = <var>E</var>; + arp.sha = <var>E</var>; + arp.tha = 00:00:00:00:00:00; + arp.spa = <var>A</var>; + arp.tpa = ip4.dst; + arp.op = 1; // ARP request + outport = <var>P</var>; + output; +}; + </pre> + + <p> + TBD: How to install MAC bindings when an ARP response comes back. + (Implement a "learn" action?) + </p> + </li> + </ul> + + <h3>Egress Table 0: ARP Details</h3> + + <p> + Packets that reach this table are ready for delivery. It contains a + single priority-0 logical flow that matches all packets and actions + <code>output;</code>. + </p> + </manpage> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml index 47dfc2a..a7ff674 100644 --- a/ovn/ovn-architecture.7.xml +++ b/ovn/ovn-architecture.7.xml @@ -596,7 +596,7 @@ </li> </ol> - <h2>Life Cycle of a Packet</h2> + <h2>Architectural Life Cycle of a Packet</h2> <p> This section describes how a packet travels from one virtual machine or diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml index d1b37f0..7b6e4ef 100644 --- a/ovn/ovn-sb.xml +++ b/ovn/ovn-sb.xml @@ -240,12 +240,12 @@ The default action when no flow matches is to drop packets. </p> - <p><em>Logical Life Cycle of a Packet</em></p> + <p><em>Architectural Logical Life Cycle of a Packet</em></p> <p> This following description focuses on the life cycle of a packet through a logical datapath, ignoring physical details of the implementation. - Please refer to <em>Life Cycle of a Packet</em> in + Please refer to <em>Architectural Life Cycle of a Packet</em> in <code>ovn-architecture</code>(7) for the physical information. </p> @@ -810,24 +810,111 @@ <dl> <dt><code><var>field1</var> = <var>field2</var>;</code></dt> <dd> - Extends the assignment action to allow copying between fields. + <p> + Extends the assignment action to allow copying between fields. + </p> + + <p> + An assignment adds prerequisites from the source and the + destination fields. + </p> + </dd> + + <dt><code>ip.ttl--;</code></dt> + <dd> + <p> + Decrements the IPv4 or IPv6 TTL. If this would make the TTL zero + or negative, then processing of the packet halts; no further + actions are processed. (To properly handle such cases, a + higher-priority flow should match on <code>ip.ttl < 2</code>.) + </p> + + <p><b>Prerequisite:</b> <code>ip</code></p> </dd> - <dt><code>learn</code></dt> + <dt><code>arp { <var>action</var>; </code>...<code> };</code></dt> + <dd> + <p> + Temporarily replaces the IPv4 packet being processed by an ARP + packet and executes each nested <var>action</var> on the ARP + packet. Actions following the <var>arp</var> action, if any, apply + to the original, unmodified packet. + </p> - <dt><code>conntrack</code></dt> + <p> + The ARP packet that this action operates on is initialized based on + the IPv4 packet being processed, as follows. These are default + values that the nested actions will probably want to change: + </p> + + <ul> + <li><code>eth.src</code> unchanged</li> + <li><code>eth.dst</code> unchanged</li> + <li><code>eth.type = 0x0806</code></li> + <li><code>arp.op = 1</code> (ARP request)</li> + <li><code>arp.sha</code> copied from <code>eth.src</code></li> + <li><code>arp.spa</code> copied from <code>ip4.src</code></li> + <li><code>arp.tha = 00:00:00:00:00:00</code></li> + <li><code>arp.tpa</code> copied from <code>ip4.dst</code></li> + </ul> + + <p><b>Prerequisite:</b> <code>ip4</code></p> + </dd> - <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>};</code></dt> + <dt><code>icmp4 { <var>action</var>; </code>...<code> };</code></dt> <dd> - decrement TTL; execute first set of actions if - successful, second set if TTL decrement fails + <p> + Temporarily replaces the IPv4 packet being processed by an ICMPv4 + packet and executes each nested <var>action</var> on the ICMPv4 + packet. Actions following the <var>icmp4</var> action, if any, + apply to the original, unmodified packet. + </p> + + <p> + The ICMPv4 packet that this action operates on is initialized based + on the IPv4 packet being processed, as follows. These are default + values that the nested actions will probably want to change. + Ethernet and IPv4 fields not listed here are not changed: + </p> + + <ul> + <li><code>ip.proto = 1</code> (ICMPv4)</li> + <li><code>ip.frag = 0</code> (not a fragment)</li> + <li><code>icmp4.type = 3</code> (destination unreachable)</li> + <li><code>icmp4.code = 1</code> (host unreachable)</li> + </ul> + + <p> + XXX need to explain exactly how the ICMP packet is constructed + </p> + + <p><b>Prerequisite:</b> <code>ip4</code></p> </dd> - <dt><code>icmp_reply { <var>action</var>, </code>...<code> };</code></dt> - <dd>generate ICMP reply from packet, execute <var>action</var>s</dd> + <dt><code>tcp_reset;</code></dt> + <dd> + <p> + This action transforms the current TCP packet according to the + following pseudocode: + </p> + + <pre> +if (tcp.ack) { + tcp.seq = tcp.ack; +} else { + tcp.ack = tcp.seq + length(tcp.payload); + tcp.seq = 0; +} +tcp.flags = RST; +</pre> - <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt> - <dd>generate ARP from packet, execute <var>action</var>s</dd> + <p> + Then, the action drops all TCP options and payload data, and + updates the TCP checksum. + </p> + + <p><b>Prerequisite:</b> <code>tcp</code></p> + </dd> </dl> </column> -- 2.1.3 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev