[nvo3] Review ptA: Technical draft-ietf-nvo3-gue-04

Bob Briscoe Sat, 13 Aug 2016 06:26:49 -0700

Tom, Lucy, Osama,

This draft looks like it could become important, so I wanted to reviewit comprehensively. Particularly given my experience contributing to thedesign of Generic UDP Tunneling (GUT; draft-manner-tsvwg-gut-02 (expiredJan 2011)), which is very similar - as the GUE draft acknowledges.


In preparation for this review:
* I re-read all the (very useful) tsvwg mailing list comments about GUT

* I had to read a couple of dozen references (to catch up on the lastfew years in this area)* it's required some pretty deep thought, which has led me to have torewrite parts of it multiple times;

I'm afraid my review is about as long as the GUE draft itself. I shouldprobably turn all this into a Internet draft, but it's email for now. SoI've split it into 3 parts in separate emails:

A) Technical Review of GUE 'As-is'        <--- This email
B) Editorial Review
C) Redesign of parts of GUE

* Pls read the review of "GUE as-is" first (ptA), it hopefully givessolid arguments for why some parts of GUE's design are problematic.* I'm afraid that is rather an understatement - I think I haveundermined nearly every part of the wire protocol: the version field,the C flag, the Hlen field, and the flag-based options. And I believethe semantics of the one remaining part (the proto/ctype field) missesan opportunity to be a lot more powerful.* Nonetheless, these are only my opinions at this stage. Therefore Ihave disciplined myself to refer out to PtC for all redesign ideas, soPtA remains solely about GUE "as-is".* I hope you will accept the review in the spirit intended -constructive criticism to improve the final result, altho I appreciateyou were probably hoping GUE was nearly done.* I'm not proprietorial about any of the ideas I give in the redesign -they are offered for the WG to use as it chooses. I don't really want tobe working on encapsulation stuff myself, I just end up having tobecause a) encap is fundamental to real networking and it's always notquite done right which makes everything else hard; and b) encap is oftenthe best way to get ahead of middebox evolution.


I should add that:

* I don't generally follow nvo3 (or intarea) lists. So apologies if someof my points are duplicates.* Nonetheless, this means my review is a good test of whether the draftis comprehensible to an outsider.* After I wrote this, I read Adrian Farrel's RTG Dir QA review. I don'tthink I have directly duplicated any of his comments. We were uneasyabout some of the same things, but I have tried to complement criticismwith alternative design proposals (ptC).* I noticed Adrian encouraged you to get review from the transport area.I'm on the transport area review team, but I haven't been asked to do an"official" transport area review of GUE. Whatever, the problems I haveuncovered are wider - best categorised as transport, protocol design(encapsulation and extensibility), ops and security.


*EXEC SUMMARY**(of ptA Technical)
*

I've split the tech review up into the following parts, and I'vehighlighted here where there are particularly serious problems:


1/ Addressing Architecture

For IETF standardization, connection semantics will need to be therule, not the exception. I know the exception applies where GUE camefrom - private DCs. However nvo3 and the IETF more generally has to copewith multi-tenant, multi-admin, and therefore firewalls (and othermiddlebox crud).

I also identify some cases where GUE cannot work that will need to bedocumented (not show-stoppers).


2/ Wire Protocol

I'm afraid I have unearthed a number of apparently nitty, butactually serious show-stoppers (IMO). E.g. GUEv1 precludes futureversions of IP and GUE extensibility only works while there are noextensions (!).

Also, the semantics of the ctype/proto field precludes some ideas we hadin GUT, but without really giving a reason. Perhaps you just hadn'trealised some potential uses of GUE that we had in mind.

This could be stated as: "Please don't unnecessarily constrain yourprotocol design solely to the use-case(s) you have in mind." This is asmuch a problem with the IETF process, which by default tries toconstrain a new protocol to the scope of one WG, even when it could bemore powerful. I've heard suggestions that GUE ought to move from nvo3to intarea?, tsvwg?, which may help, but I don't know which would bebetter. We should also bear in mind that a more powerful protocol canbecome a more powerful attack weapon in the wrong hands, so strongsecurity review is also important.


3/ State
  Important, but absent from the draft.

4/ Operation
  Numerous, but mostly minor problems. The more serious ones are:

* no way for tunnels in tunnels to know which options to copy to theouter, and which not.* The claim that "GUE permits encap of arbitrary IP protocols" isonly true until it encounters a protocol it doesn't know (!).

An improved checksum solution is also presented (in PtC), which canensure checksum coverage of all non-mutable parts of a GUE packet andtraverses middleboxes even if they do not support zero checksums, whileat the same time minimising extra processing by generally avoidingduplicate coverage.


5/ Security

I am worried about the new security options in GUE. Because they areintroduced within a completely new extension framework they willintroduce a whole set of new security vulnerabilities, flaws and bugs.The security community is stretched enough as it is having to cover whatwe already have. So it is important to justify why existing securitybuilding blocks are insufficient for GUE (IMO, the relevant motivationsections in the GUE extensions draft are insufficient).


  I also highlight some new points about firewall interactions.

6/ Implementation
  Just my little rant about LRO

Finally there's one endemic editorial problem that has led to a largenumber of technical flaws and oversights. Over and over, the differencesbetween the main two modes of usage go unstated and unresolved. Thereare only two short sections that discuss the two modes separately:* Section 5.1 Network tunnel encap (adds GUE+UDP+IP outside an existingIP header)* Section 5.2 Transport layer encap (adds GUE+UDP between an existingtransport and an existing IP header).

The majority of the draft is written in the mindset of network tunnelencap, but without saying so. If the reader is keeping both modes inmind, this makes the draft very hard to understand. But also, somefundamental problems (with one mode in some cases and the other mode inother cases) have been overlooked by not considering each modeseparately at each stage of the discussion.


*TABLE OF CONTENTS**
*
Yes, a ToC for an an email!

A/ TECHNICAL PROBLEMS/COMMENTS

1/ Addressing Architecture
1.1/ Inferring Connection Semantics: the rule not the exception
1.2/ A Firewall or NAT in front of both ends

1.3/ Multiple GUE servers (transport encap) not possible behind a NAT-PTwith one external IP1.4/ Network decap and transport decap problematic on the same (IP)interface


2/ Wire Protocol
2.1/ HLEN too small
2.2/ GUE versions
2.3/ No need to interpret the protocol field relative to IPv4
2.4/ No need to restrict interpretation of the protocol field
2.5/ Missed opportunity to liberalise interpretation of the protocol field
2.6/ Positioning GUE with respect to existing IPv6 extension headers
2.7/ Reliable delivery of control messages
2.8/ Extensibility of the flags and optional fields scheme: doesn't work
2.9/ Hard-coded option lengths do not scale
2.10/ Random access to options needs motivating

3/ State
3.1/ Per-connection state vs. stateless connections but per-tunnel state
3.2/ Transport encap with Connection Semantics: Flow state management
3.3/ Keepalives for middlebox flow state

4/ Operation
4.1/ Transport encap: to GUE or not to GUE?
4.2/ Hop limit / TTL processing
4.3/ Error messages
4.4/ Tunnels in tunnels
4.5/ SHOULD adjust MTU?
4.6/ Is orig-proto field necessary in the fragmentation option?
4.7/ Congestion Control: reductio ad absurdum
4.8/ Multicast outer -> Implosion on inner destination

4.9/ Deriving flow entropy from the inner is contrary to "GUE permitsencap of arbitrary IP protocols" claim

4.10/ Flow entropy from encrypted data could weaken the crypto?
4.11/ No need to constrain flow entropy distribution
4.12/ No need to constrain flow entropy interpretation

5/ Security

5.1/ Addresses that are both visible and hidden? Have your GUE and eatit too?5.2/ How can the Security option protect a UDP/GUE header from beingmoved or removed?

5.3/ What happens when a port scan sends a datagram to port 6080?
5.4/ Firewalls will still block new/atypical protocols
5.5/ Transport Encap: Two Passes through a Local Firewall?

6/ Implementation
6.1/ Practical Large Receive Offload Requirements


*A/ TECHNICAL PROBLEMS**/COMMENTS
*
_*1/ ADDRESSING ARCHITECTURE*__*
*_
*1.1/ Inferring Connection **Semantics: the rule not the exception
*

The draft assumes that, as a general rule, the UDP dst. port of a GUEpacket will be fixed (6080) and that flow entropy will come from thesource port (see the two quoted sections below).


S. 5.11.1. Flow classification

   " ... When a packet is encapsulated with
    GUE, the source port in the outer UDP packet is set to a flow
    entropy value ...

S.5.11.2 Flow entropy properties

        The flow entropy is the value set in the UDP source port of a
        GUE packet. Flow entropy in the UDP source port should adhere to
        the following properties:

Nonetheless, the draft recognises there will be cases where "connectionsemantics" have to be applied in order to traverse middleboxes such asfirewalls and NATs (but only mentioned in the relevant parts of 5.6.1 &5.6.2 quoted below).

Such middleboxes generally only allow "ingress" UDP datagrams if theylook like responses to recent "egress" datagram(s). So there has to be aconcept of an "initiator" end of the GUE tunnel. Only once the initiatorend has sent an "egress" datagram with src:dst ports e:G (from ephemeralport e to the GUE port G), then the GUE encap at the remote "responder"end would be able to traverse the middlebox using "ingress" datagramswith src:dst ports reversed (G:e).


S.5.6.1. Inferring connection semantics:

   A middlebox may infer bidirectional connection semantics
   [...] To operate in
   this environment, a GUE tunnel must assume connected semantics  [...]
   The source port set in the UDP
   header must be the destination port the peer would set for replies.
   In this case the UDP source port for a tunnel would be a fixed value

and not set to be flow entropy as described insection 5.11<https://tools.ietf.org/html/draft-ietf-nvo3-gue-04#section-5.11>.


   The selection of whether to make the UDP source port fixed or set to
   a flow entropy value for each packet sent should be configurable for
   a tunnel.

S. 5.6.2. NAT

   In
   the case of stateful NAT, connection semantics must be applied to a

GUE tunnel as described insection 5.6.1<https://tools.ietf.org/html/draft-ietf-nvo3-gue-04#section-5.6.1>.

[BTW, I suggest changing the final sentence of the first para inS.5.6.1. (quoted above) to:


   Therefore, in the ingress direction, the destination UDP port would
   provide flow entropy, while the source port would take the fixed

value of 6080 (the converse of the case insection 5.11<https://tools.ietf.org/html/draft-ietf-nvo3-gue-04#section-5.11>).


]

The text quoted from both sections 5.6.1 & 5.6.2 above implies

a) that the operator of tunnel endpoint(s) can somehow know whetherthere are any middleboxes within the tunnel.

b) that applying connection semantics is feasible.

Connection semantics feasibility:

* transport encap: relatively easy - it was simple to implementconnection semantics in GUT (see code<http://www.netlab.tkk.fi/%7Ejmanner/gut.html> or example in Figure 4 indraft-manner-tsvwg-gut-02<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.4>, orsee description later under A3.2/ "Transport encap with ConnectionSemantics: Flow state management"). Nonetheless, without congestionsemantics, GUE/GUT is even simpler, because it can be stateless.* network encap: harder (see separate email for my proposed design: C1/"Stateless Connection Semantics", but until there's a workingimplementation we have to allow for the possibility that it's not feasible).

Regarding the first question - whether middleboxes (such as firewalls)exist on a path:* most operators of tunnel endpoints don't know for sure, but they doknow that firewalls, etc. are very likely, so they would have to turn onthe "middleboxes exist" parameter.* in one or two important (but private) data centres, the admin mightknow that there are no firewalls (and certainly no NATs), so she canturn off the "middleboxes exist" parameter. However, that is theexception not the rule.

In summary, connection semantics are essential wherever there might bemiddleboxes. This implies:* transport encap: connection semantics are relatively simple, so whynot solely standardize this case? The few cases where the operator knowsfor certain that there are no middleboxes don't need to use connectionsemantics, but they are in private networks, so they shouldn't be theprimary use-case for standardization.

* network encap: Will connection semantics work? Two possibilities:

a) if no, the GUE network encap will be pretty useless, given nearlyall real networks contain firewalls, etc. There will be no pointstandardizing the network encap just for a few special private networksthat have no middleboxes.b) if yes, they will be needed in most real networks, so it should bethe default case that is standardized. Then the IETF has to ask, isthere any point standardizing a GUE network encap without connectionsemantics, just for a few controlled environments where the operatorknows for sure that there are no middleboxes?

Corollary of all this: A packet is a "GUE packet" if either src or dstport = 6080.


*1.2/ A Firewall or NAT in front of both ends**
*

Most firewalls / NATs only allow an incoming UDP datagram in response toa recent outgoing datagram. If there there are two such middleboxes each"protecting" a different endpoint of a GUE tunnel (network or transportencap), then neither end can send an initial GUE datagram.

To operate in such an environment, GUE endpoints will need to supportSTUN [RFC5389].

*1.3/ **Multiple GUE servers (transport encap) not possible behind aNAT-PT with one external IP**

*
Two cases:

* For transport encap: every GUE server has to have its own public IPaddress.Reason: if a NAT-PT with one external IP address (A) sits in front ofmultiple GUE servers, only one can be reached on the well-known GUE port(6080). Because there will be only one address:port combination toaddress packets to (A:6080). (Dan Wing pointed out this same problemwith GUT on the tsvwg ML<https://www.ietf.org/mail-archive/web/tsvwg/current/msg09851.html>).It's not a killer, but it is a limitation to applicability that has tobe understood and documented.


* With network encap: Non-issue.

*1.4/ Network decap and transport decap problematic on the same***(IP)*interface**

A consequence of using the same well-known port for GUE transport andnetwork encap is that both decaps cannot be deployed at the same IPaddress.

Thought experiment: This might work by implementing a combinedtransport/network decap that checked whether there was another IP headerin the header chain and:

* if there was, removed the outer IP and the outer UDP+GUE+option headers

* if not, removed solely the outer UDP+GUE+option headers, but not theouter IP.

However, there is nothing to say that a GUE transport encap should notencapsulate a packet that has already been tunnelled in an IP outer(e.g. IPsec AH or ESP). That is, the transport encap would insert a UDPand GUE header between the outer IP and the inner IP, without addinganother IP outer.

It would be safer to use two different well-known ports for transportand network encap. However, I think deploying transport and networkencap on the same IP is a corner case we just need to rule asinadmissible. Nonetheless, a sys-admin would get weird behaviour if thisdid happen, with lots of head-scratching before she realised what hadhappened. I'm not sure how to mitigate this.


_*2/ WIRE PROTOCOL*__*
*_
*2.1/ HLEN too small*
S3.1

The 5-bit Hlen field (multiplied in 4B units making max header length128B) worries me a lot.Let's not make a similar mistake to when we limited TCP option space to40B, which has caused enormous grief.


*2.2/ GUE versions*
S3.1

The hack in GUE v1 to compress out the GUE header for directencapsulation of IP (v4 or v6) seems neat, but it is also /extremelydangerous/. If GUE becomes successful, it would prevent incrementaldeployment of any new version of IP starting 0b10, 0b11 or 0b00. Because:* S.5.4 says drop an unknown version field, so IP cannot be upgradedindependently from GUE code.

* A version of IP starting 0b00 would be mistaken for GUE.

The latter might sound unlikely, but bear in mind that:

* you don't know what ideas might come up in future for using multipleversions of IP - the IP version field could become important.* a future version of IP might wrap the version field, because 0x0-0x3are no longer used (a version only has to be a unique tag, it doesn'thave to increase).

[Aside: If you prefer an equally dangerous hack (perhaps because youdon't believe there will ever be a version of IP beyond v6), you couldhave reduced the Ver field to the first single bit by making GUEv0 theone without a GUE header, and GUEv1 the one with. This would have givenmore space for the Hlen field (see my concern in A2.1/ "HLEN too small"above and my idea in a separate email to remove the C flag).]

In the separate email about redesign, I'll describe an alternativeapproach that always fits the base GUE protocol into 4B, or even withinthe 8B UDP header (see C6/ Wire Protocol; it comes from an idea todevelop GUT into what I called Gutless<https://www.ietf.org/mail-archive/web/tsvwg/current/msg09854.html>,back in Feb 2010).


*2.3/ No need to interpret the protocol field relative to IPv4**
*S3.2.1:

   The protocol number in interpreted relative
   to the IP protocol that encapsulates the UDP packet (i.e. protocol of
   the outer IP header).

IPv6 [RFC2460] defines the Next Header field to use the same protocolidentifier space as IPv4. There are no IPv4 protocol numbers that areinappropriate for IPv6 (see the IANA protocol number registry<http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml>).Therefore, this should simply say that the protocol number isinterpreted as an IPv6 protocol number (and therefore the field would bemore appropriately called "Next Header").


*2.4/ No need to restrict interpretation of the protocol field**
*S3.2.1:

This draft should not state any restrictions (e.g. those in the secondand third paragraphs quoted below) that preclude certain protocolnumbers in combination with either an IPv4 or IPv6 outer.


   For an IPv4 header the protocol may be set to any number except for
   those that refer to IPv6 extension headers or ICMPv6 options (number
   58). [...]

   For an IPv6 header the protocol may be set to any defined protocol
   number except Hop-by-hop options (number 0). [...]

Various implementations are capable of understanding an IPv6 extensionor v6-ICMP within an IPv4 header (e.g. [RFC6145<https://tools.ietf.org/html/rfc6145#section-5.2>]). And any list ofrestricted header combinations can never deal with newly definedheaders. So the only test needed is "Does your code for this combinationand order of headers have the logic for the next header?" GUE then onlyneeds to refer to the appropriate action already specified in RFC2046(quoted below) rather than making up its own rules:


   The Option Type identifiers are internally encoded such that their
   highest-order two bits specify the action that must be taken if the
   processing IPv6 node does not recognize the Option Type:
   [...]

   If, as a result of processing a header, a node is required to proceed
   to the next header but the Next Header value in the current header is
   unrecognized by the node, it should discard the packet and send an
   ICMP Parameter Problem message to the source of the packet, with an
   ICMP Code value of 1 ("unrecognized Next Header type encountered")
   and the ICMP Pointer field containing the offset of the unrecognized
   value within the original packet.  The same action should be taken if
   a node encounters a Next Header value of zero in any header other
   than an IPv6 header.

There is a sentence at the end of S.3.6 (quoted below) that repeatsthese unnecessary restrictions. If you agree with me, please also remove it.


   [...] In this case next
   header must refer to a valid IP protocol for IPv4. No other extension
   headers or destination options are permitted with IPv4.

*2.5**/ Missed opportunity to liberalise interpretation of the protocolfield**

I believe that GUE offers the opportunity to liberalise, rather thanrestrict, protocol field interpretation. In particular, GUE could allowencapsulation of hop-by-hop options (next header number 0). You mightwonder what a HbH option could possibly mean within a GUE header - seeC2.4/ "GUE: a potential solution to the IPv6 extension header discardproblem" in my separate email about how to use GUE to solve the problemwhere IPv6 packets with header extensions are highly prone to discard[RFC7872 <https://tools.ietf.org/html/rfc7872>].


*2.6/ Positioning GUE with respect to existing IPv6 extension headers**
*

The draft needs to state rules for where GUE encapsulation fits in theorder of a chain of any IPv6 extension headers already present in anarriving IPv6 packet. Below, this question is considered for both typesof encapsulation, and in both cases it can be seen that the UDP/GUEheader would not necessarily be the first header after an IPv6 outer.


* Network encap:

According to my reading of RFC2473, certain IPv6 extension headers in anarriving IPv6 should (theoretically) be copied as extension headers forthe outer:a) a Hop-by-Hop Options header (depending on the encap configuration,but a jumbogram option would have to be copied)

  b) a Routing header (depending on the encap configuration)

c) The Tunnel Encapsulation Limit Option (within a DestinationOptions Extension Header)

- HbH options are pretty academic these days, given they cause about39-54% discard [RFC7872 <https://tools.ietf.org/html/rfc7872>]. However,if there is one on the inner, I guess we should still say that a GUEnetwork encap should copy it to the outer before UDP/GUE is added.- I believe RFC2473 was wrong to say a routing header could be copiedto the outer. Imagine a packet gets tunnelled that has a routing headerlisting addresses D2, D1 & D0 still left to visit. Although it isunclear what it means to copy a routing header to the outer, it mustmean that these addresses would be visited by the tunnelled packet, thenvisited again after decapsulation.- I believe the Tunnel Encapsulation Limit Option is also prettyacademic these days, but again, if one arrived, a GUE network encapought to check the value, decrement it, and copy the header to the outer.


* Transport encap:

In this case, I have suggested where the UDP/GUE header should fit inthe following order of extension headers (copied from RFC2046):

           IPv6 header
           Hop-by-Hop Options header
         +UDP
         +GUE
           Destination Options header (note 1)
           Routing header
           Fragment header
           Authentication header (note 2)
           Encapsulating Security Payload header (note 2)
           Destination Options header (note 3)
           upper-layer header

The draft ought to mention that if AH has been applied to a packet whichis then encapsulated by GUE in transport mode, the AH header is notrecalculated, so it does not cover the UDP/GUE headers. Decapsulationworks because the UDP/GUE headers are inserted before the authenticationheader, so they will be removed (by a GUE decapsulator in transportmode) before AH is verified.

Personally I don't know enough about routing headers to make thedecision on whether they should be above or below the GUE header in thetransport encap. I believe they are only processed when a packet reachesthe destination address in the main header, but I am not familiar withall the different routing types (I know some are deprecated, and franklyI couldn't be bothered to read the others).


*2.7/ Reliable delivery of control messages**
*

The examples of potential control messages (those with the 'C' flag)given in S.3.5.1. (echo request/reply for testing) aim to mimic the datachannel, so unreliable delivery as a GUE datagram is appropriate.

The draft doesn't define any other tunnel control messages. However, ifit did, many/most would need to be delivered reliably and in order (e.g.key agreement, any necessary configuration agreement, consistentapplication of connection semantics, etc).

Therefore, reliable ordered delivery for control messages will need tobe defined (see C3.2/ "Reliable delivery of control messages" inseparate email for a suggested design).


*2.8/ Extensibility of the flags and optional fields scheme: doesn't work**
*S3.3:

This is meant to be "the primary mechanism of extensibility in GUE".However, for extensibility to work, GUE needs to distinguish between:* options: the base set of flags+options defined from the start andrequired in all GUE code

* "extensions" (my term): future extensions to the flags and options.

The current GUE flags scheme only works for options, but it inherentlyputs extensions into a chicken-and-egg stand-off. because:a) S5.4 says an implementation MUST drop a packet with an unknown flag.So, if the IETF later defines bit 7, until a very large proportion ofGUE decap implementations have been upgraded with logic that understandsbit 7, the packet is going to be dropped with high probability. So noencap is going to want to set bit 7 on a packet, so there is nomotivation for a decap to implement the code for bit 7.b) For such unknown flags, we cannot change "MUST drop" to "MUSTignore", because the lengths of the fields are not self-describing -they have to be hard-coded into an implementation. So if one GUEimplementation only has logic about the flags up to bit 6, but a packetarrives with bit 8 set, the implementation doesn't know how large the"Fields" field is, so it doesn't know where the private data starts.

For proper extensibility, each new GUE flagged option needs to beself-describing, i.e. with additional fields to say:a) Whether nodes that do not have the logic to understand the optionshould drop or ignore the packet, separately for:

  - nodes on the path
  - nodes at the dest. (decap) of the GUE datagram.

b) Whether the option is intended to change on path (in which case itshould not be covered by integrity or authentication codes).c) Whether the option should be copied or not by a GUE-in-GUE tunnelencap (see A4.4/ "Tunnels in Tunnels" later).

d) The length of the option

e) Additionally you might want to borrow the IPv6 idea of controllingwhether there needs to be an error message or not, but personally Ibelieve that is overkill (the intention was for silent failure to beimpossible for critical features, but it is very hard to deliver errormessages reliably anyway).

The above shows that attempting to invent a new extensibility schemeusually ends in tears. The IETF and others have developedtried-and-tested extensibility approaches like TLV, CBOR. Even then,they still have problems. The above points draw lessons from all this,particularly:* action codes and change codes in the initial bits of IPv6 HbH & DOoptions [RFC2460]* TRILL extension word flags: critical and non-critical separately forhop-by-hop and ingress-to-egress (see [RFC7179] updated by [RFC7780]).* 'Self-describing objects', including type and size, is listed as'Architectural Principle of the Internet' number 3.12 in [RFC1958]


*2.9/ Hard-coded option lengths do not scale**
*

By hard-coding the length of each option in an RFC and in the GUE code(rather than self-describing in the packet), you are stuck with acertain size option for ever. Experience has proven that fields such asmessage authentication codes (MACs), fragment IDs, etc. have to scale.Admittedly, we could define flags for larger fields later, but I haveshown above that new flags would be undeployable.


*2.10/ Random access to options needs motivating*
Quoting S3.3:

   Flags allow random access, for instance [...]

There might be a case for GUE to use a protocol heap rather than a stack[Braden03]. If so, please motivate it.

[Braden03] Braden, R., Faber, T. & Handley, M., "From Protocol Stack toProtocol Heap: Role-Based Architecture<http://doi.acm.org/10.1145/774763.774765>," ACM SIGCOMM ComputerCommunication Review 33:17--22 ACM (January 2003)



_*3/ STATE*__*
*_

*3.1/ Per-connection**state vs. ***stateless connections* but per-tunnelstate**

The GUE draft does not suggest a mechanism for GUE endpoints to applyconnection semantics.

* For transport encap the GUT draft suggests an approach that usesper-flow state (see the example given in Figure 4 indraft-manner-tsvwg-gut-02<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.4>).* For network encap a stateless approach is proposed in my separateemail (see C1/ "Stateless Connection Semantics"). Statelessness isimportant to simplify migration during load-balancing, failures etc.

The 'shared fate' resilience principle [Clark88] maintains that a systemshould avoid reliance on flow-state held on the path, preferring to holdstate solely at the endpoints. One could argue that, in transport encapmode, the GUE endpoints are on the end hosts, and therefore, thecommunication path is resilient because if GUE flow state is lostbecause an end host fails, the communication will have failed anyway.However, strictly, a GUE endpoint process is likely to be separate(perhaps even in NIC hardware) so it could fail independently of thetrue endpoint process of the connection.

So it would be ideal to use a stateless approach for both network andtransport encap. However, the best stateless approach I could come upwith (if it works at all) requires some coordination and hence one-offset-up latency between the GUE endpoints. Therefore, statelessconnections will be:

* more appropriate for network encap (usually long-lived tunnels); and
* less useful for transport encap (opportunistic per connection).

To summarize, it is likely that the stateful approach will be used, atleast for some GUE encapsulators in transport mode. Therefore, for thetransport encap mode at least, the draft needs to consider per-flowstate and its management (see following section).

[Clark88] Clark, D.D., "The design philosophy of the DARPA internetprotocols," Proc. ACM SIGCOMM'88, Computer Communication Review18(4):106--114 (August 1988)


*3.2/ Transport encap with **Connection Semantics: Flow state management**
*

Hosts already maintain flow-state for each connection in progress. Tosupport GUE in transport encap mode, it is trivial for the hosts at eachend to associate a little extra state with the existing state of eachinner flow:* At the initiator end, it needs no flow-state to receive GUE packets,but in order to send GUE packets, it associates the original (inner)flow's ID with the source port it will use in the UDP outer to sendevery GUE packet.* At the responder end, it has to associate the inner flow ID with thesource port in arriving GUE UDP outer headers. It needs this so that,when the inner flow sends out packets, the GUE encapsulator canintercept them and encapsulate them with a GUE header, using the storedsource port as the destination port.* Any error messages returned from the responder also need to beencapsulated in the same way.


Also, the draft needs to specify:

* that a GUE transport decap ought to protect itself against DDoS by notstoring flow state if no associated socket is open;

* how long to time out unused flow state;
* what to do with a packet if the necessary flow state is not present;

*3.3/ Keepalives for middlebox flow state**
*

Middleboxes, such as firewalls and NATs time out the pin-hole associatedwith UDP flow-state fairly rapidly, but rarely less than 15s [RFC5405].RFC5405 rightly says that an application that uses UDP should beresponsible for recovering a timed out connection, rather than the stacksending keepalives to hold open a connection, when it doesn't actuallyknow whether the application still wants the connection open.

Nonetheless, an inner flow will not be aware that it is being tunnelledusing UDP/GUE. Therefore it seems less inappropriate for the GUE encapto keep state alive on behalf of the application, so it ought to sendkeepalive GUE datagrams to hold any pin-hole open. However, if theapplication has not sent anything for some time (whatever that means),the GUE encap should time out the connection, rather than holdingmiddlebox flow-state (and its own flow-state) open for ever.

If you agree, it might be necessary to specify a keepalive controlmessage that a GUE encap can send to the remote end of the GUE tunnel(which would also keep any flow-state at the remote end alive). Thesewould only be necessary in one direction, and would not need to bereliably delivered.

See Section 3.1 of draft-manner-tsvwg-gut-02<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.1> forthe keepalive control message defined for GUT.



_*4/ OPERATION*_
*
4.1/ Transport encap: to GUE or not to GUE?**
*

For transport encap, the draft needs to say how the host decides when touse GUE and when not.There's text on this inS.4 of the GUT draft<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-4>, ifyou want to use it.


*4.2/ Hop limit / TTL processing**

*I couldn't find any text about this. Perhaps you intended this sentencein S.5.3 to cover it:


   it should follow standard conventions for tunneling of
   one IP protocol over another

I think it would be best to spell out Hop limit processing. There's texton this inS.3.2 of the GUT draft<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.2>, ifyou want to use it.



*4.3/ Error messages**
*S5.4

   No error message is returned
   back to the encapsulator.

Please go through every type of error and in each case justify why noerror message to the encap is necessary.


*4.4/ T**unnels in tunnels
*S5.5 2nd para

   It
   may encapsulate a GUE packet in another GUE packet, for instance to
   implement a network tunnel (i.e. by encapsulating an IP packet with a
   GUE payload in another IP packet as a GUE payload).

A number of problems here:

1) A "GUE packet" has not been defined. I assume any UDP header witheither src or dst UDP port = 6080 (see A1.1/ "Inferring ConnectionSemantics: the rule not the exception").

2) There is an incremental deployment problem here. Existing tunnelswon't check within the outer IP for whether a UDP port is a GUE port.They will just add a new outer IP header without the UDP or GUE.

3) Whatever, if a tunnel is GUE-aware, this para needs to be clearexactly which headers it should copy with the outer IP:* Do you intend this to mean that all the following should be copied tothe outer IP header:

  - the outer UDP,
  - any v0 GUE header
  - plus any GUE options or private data.

* Is it appropriate to copy all the options and private data? I thinkonly some (e.g. perhaps the VNID in certain circumstances?). Otherswould not have the correct semantics if blindly copied (e.g. fragmentoptions, coverage of MACs, etc).

* How does a GUE-in-GUE encapsulator know which to copy?

Also, should any extension headers on an arriving IPv6 outer also becopied to be associated with the new outer? If so, which ones, and howdoes the encapsulator know? Do the same rules apply whether usingtransport or network encapsulation?

I have been arguing since about 2009 that, when adding a new IP outer,each IP (at least IPv6) extension header should self-describe whichheaders should be copied to the outer on encap. At present RFC2473 listssome extension headers that might be copied and says it depends on theconfiguration of the encapsulator. But a hard-coded list precludesintroduction of any new extension that needs to be copied. And certainlyit doesn't work for extensions like GUE that don't fit into the originalmould of what an IPv6 extension looks like. The behaviour needs to besomehow self-declared in each header, not in a standard.

It is tough to solve this problem in a way that will work with existingtunnels. It needs solving more generally, not just for GUE. However, aslong as GUE encapsulators address this problem from day-1, GUE presentsan opportunity to solve the general problem in environments where allencapsulations are GUE-based (see my proposed solution in C4.1/"Ensuring certain GUE headers are copied when a GUE packet is tunnelled"within my separate email on redesign). Then other encapsulationapproaches might follow.


*4.5/ **SHOULD adjust MTU?
*

    An operator may set MTU to account for encapsulation overhead
    and reduce the likelihood of fragmentation.

I would expect "SHOULD" here.

You might want to refer to draft-ietf-tram-stun-pmtud for a way to doPMTUD with UDP (for STUN, but I think it would be similar for GUE).

*
**4.6/ Is orig-proto field necessary in the fragmentation option?*
S4.3 of draft-herbert-gue-extensions-00

Why does the original protocol of a fragmented packet need to be visiblebefore reassembly by declaring it in the GUE fragmentation option ofeach fragment? The GUE protocol field will be available once thefragments are reassembled, and I can't see why it would be needed beforethat.

It is not good security practice to create multiple fields that are allintended to be set to the same value. Even if the implementation usesthese orig-proto fields before reassembling the fragments, it will stillhave to check that they all match the GUE protocol field when the packethas been reassembled. And if any are not the same, it will raisesecurity concerns about any action that had previously been taken basedon an inconsistent value.


*4.7/ Congestion Control: reductio ad absurdum**
*S5.9

I suggest you remove the para about DCCP being appropriate for tunnelcongestion control. I appreciate you are trying to comply with RFC5405,but it is impossible for tunnel specs to do so without looking absurd.The more you try, the more it will look like you are the ones that areabsurd. RFC5405 gives no guidance on how to comply with its requirementabout congestion control of non-IP traffic across a tunnel... becausethere is no running code for tunnel congestion control, or for a networkcircuit breaker.

It has been suggested in the past that DCCP should be used acrosstunnels. DCCP is intended for a single flow and all the DCCP profilesdefined so far ensure a DCCP "flow" will consume about as much capacityas a TCP flow. If DCCP were to be applied across a GUE tunnel it wouldreduce the rate of the aggregate of all flows across the tunnel toroughly the same as a /single/ TCP flow (see the intro of RFC7893"Pseudowire Congestion Considerations").

One might imagine that RFC5405 means that a tunnel protocol designerwould have to detect roughly how many flows a tunnel aggregate consistedof at any one time (say N flows) and attempt to design a congestioncontrol (e.g. a DCCP profile) to consume roughly as much capacity as NTCP flows. However, this would probably cause horror for some in thetransport area at the thought of the IETF endorsing a congestion controlthat can be N times as greedy as TCP.

To further reduce the idea of a tunnel encap applying congestion controlto absurdity, it would need:a) a huge buffer to absorb incoming packets whenever they arrived fasterthan the tunnel rate. All packets (in small and large flows) would backup behind this huge queue, which would be called buffer bloat, whichwould cause horror for most people in the transport area.b) ideally, a time machine (a negative buffer) to bring packets forwardin time whenever the arrival rate of all the flows was insufficient tosatisfy the desired aggregate rate of the tunnel.c) the addition of feedback channel(s) and a huge amount of extraprocessing.

[As you can see, I don't support the idea in RFC5405 that a tunnelbecomes responsible for congestion control of traffic that itencapsulates. Otherwise, to be consistent, an Ethernet link would becomeresponsible for congestion control of traffic it encapsulates. However,I accept that consistency with RFC5405 is currently a hurdle your drafthas to cross before it can be approved. If you feel you have to suggesta mechanism, IMO a policer makes sense - either a rate policer or acongestion-rate policer.]



*4.8/ Multicast outer -> Implosion on inner destination**
*S.5.10

Consider an inner flow of unicast packets, src-IP A, dst-IP B. Considerthe encap adds an outer addressed to multicast address M, and consider ndecapsulators subscribe to group M. This will cause the network toduplicate each packet n times. As each decap forwards the inner, nduplicates of each packet will converge on B.

This might make sense with unicast inner packets for a small number ofdecaps (e.g. two for redundancy). And a multicast overlay could makesense for multicast inner packets as long as the multicast routing wasaware of the P2MP tunnel (with suitable grouping of multicast groups).

I think the text should say that a multicast outer is not precluded,because it is a theoretical possibility, but it should not be attemptedwithout a safety harness and an empty bladder.

*4.9/ Deriving flow entropy from the inner is contrary to "GUE permitsencap of arbitrary IP protocols" claim**

*S.5.11.1

The general idea for creating flow entropy seems to be for the GUE encapto map inner flows of possibly "atypical IP protocols" to individual UDPouter flows, on the assumption that switches or routers that implementECMP etc. will understand UDP but not "atypical IP protocols". Let'sexamine this claim by taking network encap and transport encap separately.


1) Network encap

Imagine that a GUE encap has been implemented that understands TCP, UDP,SCTP, DCCP, ICMP, RSVP, IPsec and ESP.Then researchers implement NewSexyTP, with a new IP protocol number.Every GUE encap in the world doesn't have any logic to understand orlocate the flow ID fields of NewSexyTP. So GUE does not "permit encap ofarbitrary IP protocols" as claimed in the motivation section.

Further, why will GUE implementations be updated with logic tounderstand NewSexyTP any faster than the ECMP code in general-purposeswitches and routers? One GUE implementation might be updated, but otherdevelopers might not so diligently track the latest transport protocols.One cannot even really argue that the ECMP code in switches and routersis implemented in hardware, so it will be harder to change than GUEcode. Because the forwarding performance of GUE tunnel encap will needto be no different to the performance of forwarding in general switchesand routers, so if hardware is necessary for one it will be necessaryfor the other.


2) Transport encap.

If GUE encap is implemented as a centralized daemon process on a host orcentralized in a NIC, it will suffer from the same lack of forwardcompatibility with new transport protocols as the network encap -particularly if it is implemented in NIC hardware. Ie, if an operatorinstalls SexyNewTP in their OS, they will also have to wait for a GUEupdate that supports SexyNewTP. This is the case with or withoutconnection semantics.

However, it might be possible to implement GUE transport encap(including with connection semantics) so that each instance of aprotocol stack is associated with an instance of GUE (warning: I have noidea yet whether this will be possible). In this case, each GUE instancewould consistently add the same outer port number to the inner protocolinstance it was associated with, without needing to understand how toidentify a flow ID in any particular protocol.

In summary, certainly for net encap, but possibly not for transportencap, GUE only helps "atypical IP protocols" that a particular GUEencap implementation already understands.


*4.10/ Flow entropy from encrypted data could weaken the crypto?**
*S.5.11.1

     o If a node is encrypting a packet using ESP tunnel mode and GUE
        encapsulation, the flow entropy could be based on the contents
        of clear-text packet. For instance, a canonical five-tuple hash
        for a TCP/IP packet could be used.

I'm not a crypto expert, but it sounds dangerous to take some clear-textfrom a known position in the data, hash it with a function that is notstrongly one-way, then send this hash along with the cipher text.

I think the SPI can be used as a unique consistent per-flow value, can'tit? The SPI has been suitably randomised so that it reveals nothingabout the flow ID.


*4.11/ No need to constrain flow entropy distribution**
*S.5.11.2

      o The flow entropy should have a uniform distribution across
        encapsulated flows.

Equal distribution of flows is not necessarily appropriate for allscenarios. Flows have a distribution of sizes, and altho ECMP isgenerally done randomly, an operator might want to (somehow) bias thehash algorithm to allow for the flows with the highest rate, which mightotherwise unbalance the load. See for instance:"Engineered Elephant Flows for Boosting Application Performance inLarge-Scale CLOS Networks<https://www.broadcom.com/collateral/wp/OF-DPA-WP102-R.pdf>" BroadcomWhite Paper (March 2014)


*4.12/ No need to constrain flow entropy interpretation**
*

        Decapsulators, or any networking devices, should not attempt to
        interpret flow entropy as anything more than an opaque value.

This seems unnecessarily constraining. This might not be a good idea,but if someone finds a use for it, there's no need to stop them - ifit's useful they'll ignore you anyway, so why bother saying it? Perhapsyou intended to explain why doing this could be problematic, rather thanprecluding it?


_*5/ SECURITY*__*
*_

*5.1/ Addresses that are both visible and hidden? Have your GUE and eatit too?**

*
S.7.  In the following sentence,

   Existing network security
   mechanisms, such as address spoofing detection, DDOS mitigation, and
   transparent encrypted tunnels can be applied to GUE packets.

This should point out that an existing set of address spoofing detectionrules would not work with GUE. I think you meant that existing rules andmechanisms could be modified to check the packets encapsulated by GUEwithout using radically new techniques.

However, if GUE is in network encap mode and it encrypts the IP headersof the inner packets, address spoofing detection and DDoS mitigationwill not be possible over the length of the GUE tunnel. You cannot bothclaim that GUE can hide information, and that GUE allows existingsecurity techniques to work that rely on access to the hidden information.

*5.2/ How can the Security option protect a UDP/GUE header from beingmoved or removed?**

The Security option is "used to provide integrity and authentication ofthe GUE header."I assume you envisage this would be complemented by other authenticationtechniques such as IPsec AH to provide integrity and authentication ofthe rest of the packet.

However, it occurs to me that the two together do not protect theintegrity of the /structure/ of the packet as a whole (whether networkor transport encap). An on-path attacker could still move the UDP/GUEheader within the packet (it might be possible to construct a validpacket with altered semantics), or remove the UDP/GUE header completely.I can't immediately think whether any damage could be done with such anattack, or how to prevent it. However, I'm sure there will be a cryptoexpert for whom this is not a new problem.

Also, the 32B max length of the security option is insufficient. Ilooked for a MAC protocol where a larger field is needed, and the firstone I picked required a larger field: RFC4383 "TESLA in Secure RTP"requires 34B, and that's just for the default sizes, not even themaximum. I picked TESLA because I knew each datagram needs a lot ofauthentication space. TESLA provides multicast message authentication,so as well as a key index and a MAC, each packet reveals a continuallychanging key.


*5.3/ What happens when a port scan sends a datagram to port 6080?**
*

When a port scan (that doesn't necessarily know about GUE) sends adatagram to port 6080, if the datagram has a body, and the body startswith a zero bit, the GUE daemon will start processing it.If the first 4 octets happen (randomly) to be set to values that wouldbe a valid GUE header (see S.5.4), it will be decapsulated and forwardedto a protocol handler.


Not a show-stopper, but worth documenting?

*5.4/ Firewalls will still block new/atypical protocols**

*Few firewalls allow incoming UDP. So GUE will not enable deployment ofservers using atypical/new protocols, which will still face a deploymentproblem.

If a firewall opens a pin-hole to allow incoming UDP to access thewell-known GUE port it would allow attackers to reach servers of anyprotocol while bypassing the firewall. E.g. an attacker could access aTCP-server by encapsulating TCP in GUE in order to bypass the firewall.Therefore, a firewall will only open a pin-hole to a GUE server, if italso inspects the packet encapsulated by GUE and applies all its normalrules to that as well.

This is why I have said elsewhere that the draft should state thatfirewall bypass by new/atypical protocols is a non-goal of GUE.


*5.5/ Transport Encap: Two Passes through a Local Firewall?**
*

GUE in transport mode resubmits the encapsulated packet to the host's IPstack. But it needs to make sure it re-injects the packet at the correctpoint in relation to any local firewall.

* If the firewall includes rules to inspect the packet encapsulated withGUE (as discussed in the previous point), it would make sense tore-submit the packet above the local firewall.* If not, GUE should resubmit the packet so that it passes through thelocal firewall again.

The latter mode would make more sense if GUE was also decrypting theinner packet. So, rather than have two options, a local firewall couldwork co-operatively with GUE in transport mode, so it doesn't have toinspect the inner in both passes.


*6/ Implementation**
*
*6.1/ Practical Large Receive Offload Requirements**
*Appendix A.4 says:

   The conservative approach to supporting LRO for GUE would be to
   assign packets to the same flow only if they have identical five-
   tuple and were encapsulated the same way. That is the outer IP
   addresses, the outer UDP ports, GUE protocol, GUE flags and fields,
   and inner five tuple are all identical.

Rant: It is sad if such a conservative approach to LRO is stillnecessary. Any API to LRO hardware needs to be able to be given thelocations of certain header fields that are deliberately intended tovary, so it can offer the facility to separately report these for eachpacket. A MAC of the encapsulating headers is a good case in point. ECNis an even better example of a varying field, because it has been astandard part of the IP header since 2001, long before LRO hardware wasdesigned.



--
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/

_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3

[nvo3] Review ptA: Technical draft-ietf-nvo3-gue-04

Reply via email to