Tom, Lucy, Osama,
This draft looks like it could become important, so I wanted to review
it comprehensively. Particularly given my experience contributing to the
design of Generic UDP Tunneling (GUT; draft-manner-tsvwg-gut-02 (expired
Jan 2011)), which is very similar - as the GUE draft acknowledges.
In preparation for this review:
* I re-read all the (very useful) tsvwg mailing list comments about GUT
* I had to read a couple of dozen references (to catch up on the last
few years in this area)
* it's required some pretty deep thought, which has led me to have to
rewrite parts of it multiple times;
I'm afraid my review is about as long as the GUE draft itself. I should
probably turn all this into a Internet draft, but it's email for now. So
I've split it into 3 parts in separate emails:
A) Technical Review of GUE 'As-is' <--- This email
B) Editorial Review
C) Redesign of parts of GUE
* Pls read the review of "GUE as-is" first (ptA), it hopefully gives
solid arguments for why some parts of GUE's design are problematic.
* I'm afraid that is rather an understatement - I think I have
undermined nearly every part of the wire protocol: the version field,
the C flag, the Hlen field, and the flag-based options. And I believe
the semantics of the one remaining part (the proto/ctype field) misses
an opportunity to be a lot more powerful.
* Nonetheless, these are only my opinions at this stage. Therefore I
have disciplined myself to refer out to PtC for all redesign ideas, so
PtA remains solely about GUE "as-is".
* I hope you will accept the review in the spirit intended -
constructive criticism to improve the final result, altho I appreciate
you were probably hoping GUE was nearly done.
* I'm not proprietorial about any of the ideas I give in the redesign -
they are offered for the WG to use as it chooses. I don't really want to
be working on encapsulation stuff myself, I just end up having to
because a) encap is fundamental to real networking and it's always not
quite done right which makes everything else hard; and b) encap is often
the best way to get ahead of middebox evolution.
I should add that:
* I don't generally follow nvo3 (or intarea) lists. So apologies if some
of my points are duplicates.
* Nonetheless, this means my review is a good test of whether the draft
is comprehensible to an outsider.
* After I wrote this, I read Adrian Farrel's RTG Dir QA review. I don't
think I have directly duplicated any of his comments. We were uneasy
about some of the same things, but I have tried to complement criticism
with alternative design proposals (ptC).
* I noticed Adrian encouraged you to get review from the transport area.
I'm on the transport area review team, but I haven't been asked to do an
"official" transport area review of GUE. Whatever, the problems I have
uncovered are wider - best categorised as transport, protocol design
(encapsulation and extensibility), ops and security.
*EXEC SUMMARY**(of ptA Technical)
*
I've split the tech review up into the following parts, and I've
highlighted here where there are particularly serious problems:
1/ Addressing Architecture
For IETF standardization, connection semantics will need to be the
rule, not the exception. I know the exception applies where GUE came
from - private DCs. However nvo3 and the IETF more generally has to cope
with multi-tenant, multi-admin, and therefore firewalls (and other
middlebox crud).
I also identify some cases where GUE cannot work that will need to be
documented (not show-stoppers).
2/ Wire Protocol
I'm afraid I have unearthed a number of apparently nitty, but
actually serious show-stoppers (IMO). E.g. GUEv1 precludes future
versions of IP and GUE extensibility only works while there are no
extensions (!).
Also, the semantics of the ctype/proto field precludes some ideas we had
in GUT, but without really giving a reason. Perhaps you just hadn't
realised some potential uses of GUE that we had in mind.
This could be stated as: "Please don't unnecessarily constrain your
protocol design solely to the use-case(s) you have in mind." This is as
much a problem with the IETF process, which by default tries to
constrain a new protocol to the scope of one WG, even when it could be
more powerful. I've heard suggestions that GUE ought to move from nvo3
to intarea?, tsvwg?, which may help, but I don't know which would be
better. We should also bear in mind that a more powerful protocol can
become a more powerful attack weapon in the wrong hands, so strong
security review is also important.
3/ State
Important, but absent from the draft.
4/ Operation
Numerous, but mostly minor problems. The more serious ones are:
* no way for tunnels in tunnels to know which options to copy to the
outer, and which not.
* The claim that "GUE permits encap of arbitrary IP protocols" is
only true until it encounters a protocol it doesn't know (!).
An improved checksum solution is also presented (in PtC), which can
ensure checksum coverage of all non-mutable parts of a GUE packet and
traverses middleboxes even if they do not support zero checksums, while
at the same time minimising extra processing by generally avoiding
duplicate coverage.
5/ Security
I am worried about the new security options in GUE. Because they are
introduced within a completely new extension framework they will
introduce a whole set of new security vulnerabilities, flaws and bugs.
The security community is stretched enough as it is having to cover what
we already have. So it is important to justify why existing security
building blocks are insufficient for GUE (IMO, the relevant motivation
sections in the GUE extensions draft are insufficient).
I also highlight some new points about firewall interactions.
6/ Implementation
Just my little rant about LRO
Finally there's one endemic editorial problem that has led to a large
number of technical flaws and oversights. Over and over, the differences
between the main two modes of usage go unstated and unresolved. There
are only two short sections that discuss the two modes separately:
* Section 5.1 Network tunnel encap (adds GUE+UDP+IP outside an existing
IP header)
* Section 5.2 Transport layer encap (adds GUE+UDP between an existing
transport and an existing IP header).
The majority of the draft is written in the mindset of network tunnel
encap, but without saying so. If the reader is keeping both modes in
mind, this makes the draft very hard to understand. But also, some
fundamental problems (with one mode in some cases and the other mode in
other cases) have been overlooked by not considering each mode
separately at each stage of the discussion.
*TABLE OF CONTENTS**
*
Yes, a ToC for an an email!
A/ TECHNICAL PROBLEMS/COMMENTS
1/ Addressing Architecture
1.1/ Inferring Connection Semantics: the rule not the exception
1.2/ A Firewall or NAT in front of both ends
1.3/ Multiple GUE servers (transport encap) not possible behind a NAT-PT
with one external IP
1.4/ Network decap and transport decap problematic on the same (IP)
interface
2/ Wire Protocol
2.1/ HLEN too small
2.2/ GUE versions
2.3/ No need to interpret the protocol field relative to IPv4
2.4/ No need to restrict interpretation of the protocol field
2.5/ Missed opportunity to liberalise interpretation of the protocol field
2.6/ Positioning GUE with respect to existing IPv6 extension headers
2.7/ Reliable delivery of control messages
2.8/ Extensibility of the flags and optional fields scheme: doesn't work
2.9/ Hard-coded option lengths do not scale
2.10/ Random access to options needs motivating
3/ State
3.1/ Per-connection state vs. stateless connections but per-tunnel state
3.2/ Transport encap with Connection Semantics: Flow state management
3.3/ Keepalives for middlebox flow state
4/ Operation
4.1/ Transport encap: to GUE or not to GUE?
4.2/ Hop limit / TTL processing
4.3/ Error messages
4.4/ Tunnels in tunnels
4.5/ SHOULD adjust MTU?
4.6/ Is orig-proto field necessary in the fragmentation option?
4.7/ Congestion Control: reductio ad absurdum
4.8/ Multicast outer -> Implosion on inner destination
4.9/ Deriving flow entropy from the inner is contrary to "GUE permits
encap of arbitrary IP protocols" claim
4.10/ Flow entropy from encrypted data could weaken the crypto?
4.11/ No need to constrain flow entropy distribution
4.12/ No need to constrain flow entropy interpretation
5/ Security
5.1/ Addresses that are both visible and hidden? Have your GUE and eat
it too?
5.2/ How can the Security option protect a UDP/GUE header from being
moved or removed?
5.3/ What happens when a port scan sends a datagram to port 6080?
5.4/ Firewalls will still block new/atypical protocols
5.5/ Transport Encap: Two Passes through a Local Firewall?
6/ Implementation
6.1/ Practical Large Receive Offload Requirements
*A/ TECHNICAL PROBLEMS**/COMMENTS
*
_*1/ ADDRESSING ARCHITECTURE*__*
*_
*1.1/ Inferring Connection **Semantics: the rule not the exception
*
The draft assumes that, as a general rule, the UDP dst. port of a GUE
packet will be fixed (6080) and that flow entropy will come from the
source port (see the two quoted sections below).
S. 5.11.1. Flow classification
" ... When a packet is encapsulated with
GUE, the source port in the outer UDP packet is set to a flow
entropy value ...
S.5.11.2 Flow entropy properties
The flow entropy is the value set in the UDP source port of a
GUE packet. Flow entropy in the UDP source port should adhere to
the following properties:
Nonetheless, the draft recognises there will be cases where "connection
semantics" have to be applied in order to traverse middleboxes such as
firewalls and NATs (but only mentioned in the relevant parts of 5.6.1 &
5.6.2 quoted below).
Such middleboxes generally only allow "ingress" UDP datagrams if they
look like responses to recent "egress" datagram(s). So there has to be a
concept of an "initiator" end of the GUE tunnel. Only once the initiator
end has sent an "egress" datagram with src:dst ports e:G (from ephemeral
port e to the GUE port G), then the GUE encap at the remote "responder"
end would be able to traverse the middlebox using "ingress" datagrams
with src:dst ports reversed (G:e).
S.5.6.1. Inferring connection semantics:
A middlebox may infer bidirectional connection semantics
[...] To operate in
this environment, a GUE tunnel must assume connected semantics [...]
The source port set in the UDP
header must be the destination port the peer would set for replies.
In this case the UDP source port for a tunnel would be a fixed value
and not set to be flow entropy as described insection 5.11
<https://tools.ietf.org/html/draft-ietf-nvo3-gue-04#section-5.11>.
The selection of whether to make the UDP source port fixed or set to
a flow entropy value for each packet sent should be configurable for
a tunnel.
S. 5.6.2. NAT
In
the case of stateful NAT, connection semantics must be applied to a
GUE tunnel as described insection 5.6.1
<https://tools.ietf.org/html/draft-ietf-nvo3-gue-04#section-5.6.1>.
[BTW, I suggest changing the final sentence of the first para in
S.5.6.1. (quoted above) to:
Therefore, in the ingress direction, the destination UDP port would
provide flow entropy, while the source port would take the fixed
value of 6080 (the converse of the case insection 5.11
<https://tools.ietf.org/html/draft-ietf-nvo3-gue-04#section-5.11>).
]
The text quoted from both sections 5.6.1 & 5.6.2 above implies
a) that the operator of tunnel endpoint(s) can somehow know whether
there are any middleboxes within the tunnel.
b) that applying connection semantics is feasible.
Connection semantics feasibility:
* transport encap: relatively easy - it was simple to implement
connection semantics in GUT (see code
<http://www.netlab.tkk.fi/%7Ejmanner/gut.html> or example in Figure 4 in
draft-manner-tsvwg-gut-02
<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.4>, or
see description later under A3.2/ "Transport encap with Connection
Semantics: Flow state management"). Nonetheless, without congestion
semantics, GUE/GUT is even simpler, because it can be stateless.
* network encap: harder (see separate email for my proposed design: C1/
"Stateless Connection Semantics", but until there's a working
implementation we have to allow for the possibility that it's not feasible).
Regarding the first question - whether middleboxes (such as firewalls)
exist on a path:
* most operators of tunnel endpoints don't know for sure, but they do
know that firewalls, etc. are very likely, so they would have to turn on
the "middleboxes exist" parameter.
* in one or two important (but private) data centres, the admin might
know that there are no firewalls (and certainly no NATs), so she can
turn off the "middleboxes exist" parameter. However, that is the
exception not the rule.
In summary, connection semantics are essential wherever there might be
middleboxes. This implies:
* transport encap: connection semantics are relatively simple, so why
not solely standardize this case? The few cases where the operator knows
for certain that there are no middleboxes don't need to use connection
semantics, but they are in private networks, so they shouldn't be the
primary use-case for standardization.
* network encap: Will connection semantics work? Two possibilities:
a) if no, the GUE network encap will be pretty useless, given nearly
all real networks contain firewalls, etc. There will be no point
standardizing the network encap just for a few special private networks
that have no middleboxes.
b) if yes, they will be needed in most real networks, so it should be
the default case that is standardized. Then the IETF has to ask, is
there any point standardizing a GUE network encap without connection
semantics, just for a few controlled environments where the operator
knows for sure that there are no middleboxes?
Corollary of all this: A packet is a "GUE packet" if either src or dst
port = 6080.
*1.2/ A Firewall or NAT in front of both ends**
*
Most firewalls / NATs only allow an incoming UDP datagram in response to
a recent outgoing datagram. If there there are two such middleboxes each
"protecting" a different endpoint of a GUE tunnel (network or transport
encap), then neither end can send an initial GUE datagram.
To operate in such an environment, GUE endpoints will need to support
STUN [RFC5389].
*1.3/ **Multiple GUE servers (transport encap) not possible behind a
NAT-PT with one external IP**
*
Two cases:
* For transport encap: every GUE server has to have its own public IP
address.
Reason: if a NAT-PT with one external IP address (A) sits in front of
multiple GUE servers, only one can be reached on the well-known GUE port
(6080). Because there will be only one address:port combination to
address packets to (A:6080). (Dan Wing pointed out this same problem
with GUT on the tsvwg ML
<https://www.ietf.org/mail-archive/web/tsvwg/current/msg09851.html>).
It's not a killer, but it is a limitation to applicability that has to
be understood and documented.
* With network encap: Non-issue.
*1.4/ Network decap and transport decap problematic on the same***(IP)*
interface**
*
A consequence of using the same well-known port for GUE transport and
network encap is that both decaps cannot be deployed at the same IP
address.
Thought experiment: This might work by implementing a combined
transport/network decap that checked whether there was another IP header
in the header chain and:
* if there was, removed the outer IP and the outer UDP+GUE+option headers
* if not, removed solely the outer UDP+GUE+option headers, but not the
outer IP.
However, there is nothing to say that a GUE transport encap should not
encapsulate a packet that has already been tunnelled in an IP outer
(e.g. IPsec AH or ESP). That is, the transport encap would insert a UDP
and GUE header between the outer IP and the inner IP, without adding
another IP outer.
It would be safer to use two different well-known ports for transport
and network encap. However, I think deploying transport and network
encap on the same IP is a corner case we just need to rule as
inadmissible. Nonetheless, a sys-admin would get weird behaviour if this
did happen, with lots of head-scratching before she realised what had
happened. I'm not sure how to mitigate this.
_*2/ WIRE PROTOCOL*__*
*_
*2.1/ HLEN too small*
S3.1
The 5-bit Hlen field (multiplied in 4B units making max header length
128B) worries me a lot.
Let's not make a similar mistake to when we limited TCP option space to
40B, which has caused enormous grief.
*2.2/ GUE versions*
S3.1
The hack in GUE v1 to compress out the GUE header for direct
encapsulation of IP (v4 or v6) seems neat, but it is also /extremely
dangerous/. If GUE becomes successful, it would prevent incremental
deployment of any new version of IP starting 0b10, 0b11 or 0b00. Because:
* S.5.4 says drop an unknown version field, so IP cannot be upgraded
independently from GUE code.
* A version of IP starting 0b00 would be mistaken for GUE.
The latter might sound unlikely, but bear in mind that:
* you don't know what ideas might come up in future for using multiple
versions of IP - the IP version field could become important.
* a future version of IP might wrap the version field, because 0x0-0x3
are no longer used (a version only has to be a unique tag, it doesn't
have to increase).
[Aside: If you prefer an equally dangerous hack (perhaps because you
don't believe there will ever be a version of IP beyond v6), you could
have reduced the Ver field to the first single bit by making GUEv0 the
one without a GUE header, and GUEv1 the one with. This would have given
more space for the Hlen field (see my concern in A2.1/ "HLEN too small"
above and my idea in a separate email to remove the C flag).]
In the separate email about redesign, I'll describe an alternative
approach that always fits the base GUE protocol into 4B, or even within
the 8B UDP header (see C6/ Wire Protocol; it comes from an idea to
develop GUT into what I called Gutless
<https://www.ietf.org/mail-archive/web/tsvwg/current/msg09854.html>,
back in Feb 2010).
*2.3/ No need to interpret the protocol field relative to IPv4**
*S3.2.1:
The protocol number in interpreted relative
to the IP protocol that encapsulates the UDP packet (i.e. protocol of
the outer IP header).
IPv6 [RFC2460] defines the Next Header field to use the same protocol
identifier space as IPv4. There are no IPv4 protocol numbers that are
inappropriate for IPv6 (see the IANA protocol number registry
<http://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml>).
Therefore, this should simply say that the protocol number is
interpreted as an IPv6 protocol number (and therefore the field would be
more appropriately called "Next Header").
*2.4/ No need to restrict interpretation of the protocol field**
*S3.2.1:
This draft should not state any restrictions (e.g. those in the second
and third paragraphs quoted below) that preclude certain protocol
numbers in combination with either an IPv4 or IPv6 outer.
For an IPv4 header the protocol may be set to any number except for
those that refer to IPv6 extension headers or ICMPv6 options (number
58). [...]
For an IPv6 header the protocol may be set to any defined protocol
number except Hop-by-hop options (number 0). [...]
Various implementations are capable of understanding an IPv6 extension
or v6-ICMP within an IPv4 header (e.g. [RFC6145
<https://tools.ietf.org/html/rfc6145#section-5.2>]). And any list of
restricted header combinations can never deal with newly defined
headers. So the only test needed is "Does your code for this combination
and order of headers have the logic for the next header?" GUE then only
needs to refer to the appropriate action already specified in RFC2046
(quoted below) rather than making up its own rules:
The Option Type identifiers are internally encoded such that their
highest-order two bits specify the action that must be taken if the
processing IPv6 node does not recognize the Option Type:
[...]
If, as a result of processing a header, a node is required to proceed
to the next header but the Next Header value in the current header is
unrecognized by the node, it should discard the packet and send an
ICMP Parameter Problem message to the source of the packet, with an
ICMP Code value of 1 ("unrecognized Next Header type encountered")
and the ICMP Pointer field containing the offset of the unrecognized
value within the original packet. The same action should be taken if
a node encounters a Next Header value of zero in any header other
than an IPv6 header.
There is a sentence at the end of S.3.6 (quoted below) that repeats
these unnecessary restrictions. If you agree with me, please also remove it.
[...] In this case next
header must refer to a valid IP protocol for IPv4. No other extension
headers or destination options are permitted with IPv4.
*2.5**/ Missed opportunity to liberalise interpretation of the protocol
field**
*
I believe that GUE offers the opportunity to liberalise, rather than
restrict, protocol field interpretation. In particular, GUE could allow
encapsulation of hop-by-hop options (next header number 0). You might
wonder what a HbH option could possibly mean within a GUE header - see
C2.4/ "GUE: a potential solution to the IPv6 extension header discard
problem" in my separate email about how to use GUE to solve the problem
where IPv6 packets with header extensions are highly prone to discard
[RFC7872 <https://tools.ietf.org/html/rfc7872>].
*2.6/ Positioning GUE with respect to existing IPv6 extension headers**
*
The draft needs to state rules for where GUE encapsulation fits in the
order of a chain of any IPv6 extension headers already present in an
arriving IPv6 packet. Below, this question is considered for both types
of encapsulation, and in both cases it can be seen that the UDP/GUE
header would not necessarily be the first header after an IPv6 outer.
* Network encap:
According to my reading of RFC2473, certain IPv6 extension headers in an
arriving IPv6 should (theoretically) be copied as extension headers for
the outer:
a) a Hop-by-Hop Options header (depending on the encap configuration,
but a jumbogram option would have to be copied)
b) a Routing header (depending on the encap configuration)
c) The Tunnel Encapsulation Limit Option (within a Destination
Options Extension Header)
- HbH options are pretty academic these days, given they cause about
39-54% discard [RFC7872 <https://tools.ietf.org/html/rfc7872>]. However,
if there is one on the inner, I guess we should still say that a GUE
network encap should copy it to the outer before UDP/GUE is added.
- I believe RFC2473 was wrong to say a routing header could be copied
to the outer. Imagine a packet gets tunnelled that has a routing header
listing addresses D2, D1 & D0 still left to visit. Although it is
unclear what it means to copy a routing header to the outer, it must
mean that these addresses would be visited by the tunnelled packet, then
visited again after decapsulation.
- I believe the Tunnel Encapsulation Limit Option is also pretty
academic these days, but again, if one arrived, a GUE network encap
ought to check the value, decrement it, and copy the header to the outer.
* Transport encap:
In this case, I have suggested where the UDP/GUE header should fit in
the following order of extension headers (copied from RFC2046):
IPv6 header
Hop-by-Hop Options header
+UDP
+GUE
Destination Options header (note 1)
Routing header
Fragment header
Authentication header (note 2)
Encapsulating Security Payload header (note 2)
Destination Options header (note 3)
upper-layer header
The draft ought to mention that if AH has been applied to a packet which
is then encapsulated by GUE in transport mode, the AH header is not
recalculated, so it does not cover the UDP/GUE headers. Decapsulation
works because the UDP/GUE headers are inserted before the authentication
header, so they will be removed (by a GUE decapsulator in transport
mode) before AH is verified.
Personally I don't know enough about routing headers to make the
decision on whether they should be above or below the GUE header in the
transport encap. I believe they are only processed when a packet reaches
the destination address in the main header, but I am not familiar with
all the different routing types (I know some are deprecated, and frankly
I couldn't be bothered to read the others).
*2.7/ Reliable delivery of control messages**
*
The examples of potential control messages (those with the 'C' flag)
given in S.3.5.1. (echo request/reply for testing) aim to mimic the data
channel, so unreliable delivery as a GUE datagram is appropriate.
The draft doesn't define any other tunnel control messages. However, if
it did, many/most would need to be delivered reliably and in order (e.g.
key agreement, any necessary configuration agreement, consistent
application of connection semantics, etc).
Therefore, reliable ordered delivery for control messages will need to
be defined (see C3.2/ "Reliable delivery of control messages" in
separate email for a suggested design).
*2.8/ Extensibility of the flags and optional fields scheme: doesn't work**
*S3.3:
This is meant to be "the primary mechanism of extensibility in GUE".
However, for extensibility to work, GUE needs to distinguish between:
* options: the base set of flags+options defined from the start and
required in all GUE code
* "extensions" (my term): future extensions to the flags and options.
The current GUE flags scheme only works for options, but it inherently
puts extensions into a chicken-and-egg stand-off. because:
a) S5.4 says an implementation MUST drop a packet with an unknown flag.
So, if the IETF later defines bit 7, until a very large proportion of
GUE decap implementations have been upgraded with logic that understands
bit 7, the packet is going to be dropped with high probability. So no
encap is going to want to set bit 7 on a packet, so there is no
motivation for a decap to implement the code for bit 7.
b) For such unknown flags, we cannot change "MUST drop" to "MUST
ignore", because the lengths of the fields are not self-describing -
they have to be hard-coded into an implementation. So if one GUE
implementation only has logic about the flags up to bit 6, but a packet
arrives with bit 8 set, the implementation doesn't know how large the
"Fields" field is, so it doesn't know where the private data starts.
For proper extensibility, each new GUE flagged option needs to be
self-describing, i.e. with additional fields to say:
a) Whether nodes that do not have the logic to understand the option
should drop or ignore the packet, separately for:
- nodes on the path
- nodes at the dest. (decap) of the GUE datagram.
b) Whether the option is intended to change on path (in which case it
should not be covered by integrity or authentication codes).
c) Whether the option should be copied or not by a GUE-in-GUE tunnel
encap (see A4.4/ "Tunnels in Tunnels" later).
d) The length of the option
e) Additionally you might want to borrow the IPv6 idea of controlling
whether there needs to be an error message or not, but personally I
believe that is overkill (the intention was for silent failure to be
impossible for critical features, but it is very hard to deliver error
messages reliably anyway).
The above shows that attempting to invent a new extensibility scheme
usually ends in tears. The IETF and others have developed
tried-and-tested extensibility approaches like TLV, CBOR. Even then,
they still have problems. The above points draw lessons from all this,
particularly:
* action codes and change codes in the initial bits of IPv6 HbH & DO
options [RFC2460]
* TRILL extension word flags: critical and non-critical separately for
hop-by-hop and ingress-to-egress (see [RFC7179] updated by [RFC7780]).
* 'Self-describing objects', including type and size, is listed as
'Architectural Principle of the Internet' number 3.12 in [RFC1958]
*2.9/ Hard-coded option lengths do not scale**
*
By hard-coding the length of each option in an RFC and in the GUE code
(rather than self-describing in the packet), you are stuck with a
certain size option for ever. Experience has proven that fields such as
message authentication codes (MACs), fragment IDs, etc. have to scale.
Admittedly, we could define flags for larger fields later, but I have
shown above that new flags would be undeployable.
*2.10/ Random access to options needs motivating*
Quoting S3.3:
Flags allow random access, for instance [...]
There might be a case for GUE to use a protocol heap rather than a stack
[Braden03]. If so, please motivate it.
[Braden03] Braden, R., Faber, T. & Handley, M., "From Protocol Stack to
Protocol Heap: Role-Based Architecture
<http://doi.acm.org/10.1145/774763.774765>," ACM SIGCOMM Computer
Communication Review 33:17--22 ACM (January 2003)
_*3/ STATE*__*
*_
*3.1/ Per-connection**state vs. ***stateless connections* but per-tunnel
state**
*
The GUE draft does not suggest a mechanism for GUE endpoints to apply
connection semantics.
* For transport encap the GUT draft suggests an approach that uses
per-flow state (see the example given in Figure 4 in
draft-manner-tsvwg-gut-02
<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.4>).
* For network encap a stateless approach is proposed in my separate
email (see C1/ "Stateless Connection Semantics"). Statelessness is
important to simplify migration during load-balancing, failures etc.
The 'shared fate' resilience principle [Clark88] maintains that a system
should avoid reliance on flow-state held on the path, preferring to hold
state solely at the endpoints. One could argue that, in transport encap
mode, the GUE endpoints are on the end hosts, and therefore, the
communication path is resilient because if GUE flow state is lost
because an end host fails, the communication will have failed anyway.
However, strictly, a GUE endpoint process is likely to be separate
(perhaps even in NIC hardware) so it could fail independently of the
true endpoint process of the connection.
So it would be ideal to use a stateless approach for both network and
transport encap. However, the best stateless approach I could come up
with (if it works at all) requires some coordination and hence one-off
set-up latency between the GUE endpoints. Therefore, stateless
connections will be:
* more appropriate for network encap (usually long-lived tunnels); and
* less useful for transport encap (opportunistic per connection).
To summarize, it is likely that the stateful approach will be used, at
least for some GUE encapsulators in transport mode. Therefore, for the
transport encap mode at least, the draft needs to consider per-flow
state and its management (see following section).
[Clark88] Clark, D.D., "The design philosophy of the DARPA internet
protocols," Proc. ACM SIGCOMM'88, Computer Communication Review
18(4):106--114 (August 1988)
*3.2/ Transport encap with **Connection Semantics: Flow state management**
*
Hosts already maintain flow-state for each connection in progress. To
support GUE in transport encap mode, it is trivial for the hosts at each
end to associate a little extra state with the existing state of each
inner flow:
* At the initiator end, it needs no flow-state to receive GUE packets,
but in order to send GUE packets, it associates the original (inner)
flow's ID with the source port it will use in the UDP outer to send
every GUE packet.
* At the responder end, it has to associate the inner flow ID with the
source port in arriving GUE UDP outer headers. It needs this so that,
when the inner flow sends out packets, the GUE encapsulator can
intercept them and encapsulate them with a GUE header, using the stored
source port as the destination port.
* Any error messages returned from the responder also need to be
encapsulated in the same way.
Also, the draft needs to specify:
* that a GUE transport decap ought to protect itself against DDoS by not
storing flow state if no associated socket is open;
* how long to time out unused flow state;
* what to do with a packet if the necessary flow state is not present;
*3.3/ Keepalives for middlebox flow state**
*
Middleboxes, such as firewalls and NATs time out the pin-hole associated
with UDP flow-state fairly rapidly, but rarely less than 15s [RFC5405].
RFC5405 rightly says that an application that uses UDP should be
responsible for recovering a timed out connection, rather than the stack
sending keepalives to hold open a connection, when it doesn't actually
know whether the application still wants the connection open.
Nonetheless, an inner flow will not be aware that it is being tunnelled
using UDP/GUE. Therefore it seems less inappropriate for the GUE encap
to keep state alive on behalf of the application, so it ought to send
keepalive GUE datagrams to hold any pin-hole open. However, if the
application has not sent anything for some time (whatever that means),
the GUE encap should time out the connection, rather than holding
middlebox flow-state (and its own flow-state) open for ever.
If you agree, it might be necessary to specify a keepalive control
message that a GUE encap can send to the remote end of the GUE tunnel
(which would also keep any flow-state at the remote end alive). These
would only be necessary in one direction, and would not need to be
reliably delivered.
See Section 3.1 of draft-manner-tsvwg-gut-02
<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.1> for
the keepalive control message defined for GUT.
_*4/ OPERATION*_
*
4.1/ Transport encap: to GUE or not to GUE?**
*
For transport encap, the draft needs to say how the host decides when to
use GUE and when not.
There's text on this inS.4 of the GUT draft
<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-4>, if
you want to use it.
*4.2/ Hop limit / TTL processing**
*I couldn't find any text about this. Perhaps you intended this sentence
in S.5.3 to cover it:
it should follow standard conventions for tunneling of
one IP protocol over another
I think it would be best to spell out Hop limit processing. There's text
on this inS.3.2 of the GUT draft
<https://tools.ietf.org/html/draft-manner-tsvwg-gut-02#section-3.2>, if
you want to use it.
*4.3/ Error messages**
*S5.4
No error message is returned
back to the encapsulator.
Please go through every type of error and in each case justify why no
error message to the encap is necessary.
*4.4/ T**unnels in tunnels
*S5.5 2nd para
It
may encapsulate a GUE packet in another GUE packet, for instance to
implement a network tunnel (i.e. by encapsulating an IP packet with a
GUE payload in another IP packet as a GUE payload).
A number of problems here:
1) A "GUE packet" has not been defined. I assume any UDP header with
either src or dst UDP port = 6080 (see A1.1/ "Inferring Connection
Semantics: the rule not the exception").
2) There is an incremental deployment problem here. Existing tunnels
won't check within the outer IP for whether a UDP port is a GUE port.
They will just add a new outer IP header without the UDP or GUE.
3) Whatever, if a tunnel is GUE-aware, this para needs to be clear
exactly which headers it should copy with the outer IP:
* Do you intend this to mean that all the following should be copied to
the outer IP header:
- the outer UDP,
- any v0 GUE header
- plus any GUE options or private data.
* Is it appropriate to copy all the options and private data? I think
only some (e.g. perhaps the VNID in certain circumstances?). Others
would not have the correct semantics if blindly copied (e.g. fragment
options, coverage of MACs, etc).
* How does a GUE-in-GUE encapsulator know which to copy?
Also, should any extension headers on an arriving IPv6 outer also be
copied to be associated with the new outer? If so, which ones, and how
does the encapsulator know? Do the same rules apply whether using
transport or network encapsulation?
I have been arguing since about 2009 that, when adding a new IP outer,
each IP (at least IPv6) extension header should self-describe which
headers should be copied to the outer on encap. At present RFC2473 lists
some extension headers that might be copied and says it depends on the
configuration of the encapsulator. But a hard-coded list precludes
introduction of any new extension that needs to be copied. And certainly
it doesn't work for extensions like GUE that don't fit into the original
mould of what an IPv6 extension looks like. The behaviour needs to be
somehow self-declared in each header, not in a standard.
It is tough to solve this problem in a way that will work with existing
tunnels. It needs solving more generally, not just for GUE. However, as
long as GUE encapsulators address this problem from day-1, GUE presents
an opportunity to solve the general problem in environments where all
encapsulations are GUE-based (see my proposed solution in C4.1/
"Ensuring certain GUE headers are copied when a GUE packet is tunnelled"
within my separate email on redesign). Then other encapsulation
approaches might follow.
*4.5/ **SHOULD adjust MTU?
*
An operator may set MTU to account for encapsulation overhead
and reduce the likelihood of fragmentation.
I would expect "SHOULD" here.
You might want to refer to draft-ietf-tram-stun-pmtud for a way to do
PMTUD with UDP (for STUN, but I think it would be similar for GUE).
*
**4.6/ Is orig-proto field necessary in the fragmentation option?*
S4.3 of draft-herbert-gue-extensions-00
Why does the original protocol of a fragmented packet need to be visible
before reassembly by declaring it in the GUE fragmentation option of
each fragment? The GUE protocol field will be available once the
fragments are reassembled, and I can't see why it would be needed before
that.
It is not good security practice to create multiple fields that are all
intended to be set to the same value. Even if the implementation uses
these orig-proto fields before reassembling the fragments, it will still
have to check that they all match the GUE protocol field when the packet
has been reassembled. And if any are not the same, it will raise
security concerns about any action that had previously been taken based
on an inconsistent value.
*4.7/ Congestion Control: reductio ad absurdum**
*S5.9
I suggest you remove the para about DCCP being appropriate for tunnel
congestion control. I appreciate you are trying to comply with RFC5405,
but it is impossible for tunnel specs to do so without looking absurd.
The more you try, the more it will look like you are the ones that are
absurd. RFC5405 gives no guidance on how to comply with its requirement
about congestion control of non-IP traffic across a tunnel... because
there is no running code for tunnel congestion control, or for a network
circuit breaker.
It has been suggested in the past that DCCP should be used across
tunnels. DCCP is intended for a single flow and all the DCCP profiles
defined so far ensure a DCCP "flow" will consume about as much capacity
as a TCP flow. If DCCP were to be applied across a GUE tunnel it would
reduce the rate of the aggregate of all flows across the tunnel to
roughly the same as a /single/ TCP flow (see the intro of RFC7893
"Pseudowire Congestion Considerations").
One might imagine that RFC5405 means that a tunnel protocol designer
would have to detect roughly how many flows a tunnel aggregate consisted
of at any one time (say N flows) and attempt to design a congestion
control (e.g. a DCCP profile) to consume roughly as much capacity as N
TCP flows. However, this would probably cause horror for some in the
transport area at the thought of the IETF endorsing a congestion control
that can be N times as greedy as TCP.
To further reduce the idea of a tunnel encap applying congestion control
to absurdity, it would need:
a) a huge buffer to absorb incoming packets whenever they arrived faster
than the tunnel rate. All packets (in small and large flows) would back
up behind this huge queue, which would be called buffer bloat, which
would cause horror for most people in the transport area.
b) ideally, a time machine (a negative buffer) to bring packets forward
in time whenever the arrival rate of all the flows was insufficient to
satisfy the desired aggregate rate of the tunnel.
c) the addition of feedback channel(s) and a huge amount of extra
processing.
[As you can see, I don't support the idea in RFC5405 that a tunnel
becomes responsible for congestion control of traffic that it
encapsulates. Otherwise, to be consistent, an Ethernet link would become
responsible for congestion control of traffic it encapsulates. However,
I accept that consistency with RFC5405 is currently a hurdle your draft
has to cross before it can be approved. If you feel you have to suggest
a mechanism, IMO a policer makes sense - either a rate policer or a
congestion-rate policer.]
*4.8/ Multicast outer -> Implosion on inner destination**
*S.5.10
Consider an inner flow of unicast packets, src-IP A, dst-IP B. Consider
the encap adds an outer addressed to multicast address M, and consider n
decapsulators subscribe to group M. This will cause the network to
duplicate each packet n times. As each decap forwards the inner, n
duplicates of each packet will converge on B.
This might make sense with unicast inner packets for a small number of
decaps (e.g. two for redundancy). And a multicast overlay could make
sense for multicast inner packets as long as the multicast routing was
aware of the P2MP tunnel (with suitable grouping of multicast groups).
I think the text should say that a multicast outer is not precluded,
because it is a theoretical possibility, but it should not be attempted
without a safety harness and an empty bladder.
*4.9/ Deriving flow entropy from the inner is contrary to "GUE permits
encap of arbitrary IP protocols" claim**
*S.5.11.1
The general idea for creating flow entropy seems to be for the GUE encap
to map inner flows of possibly "atypical IP protocols" to individual UDP
outer flows, on the assumption that switches or routers that implement
ECMP etc. will understand UDP but not "atypical IP protocols". Let's
examine this claim by taking network encap and transport encap separately.
1) Network encap
Imagine that a GUE encap has been implemented that understands TCP, UDP,
SCTP, DCCP, ICMP, RSVP, IPsec and ESP.
Then researchers implement NewSexyTP, with a new IP protocol number.
Every GUE encap in the world doesn't have any logic to understand or
locate the flow ID fields of NewSexyTP. So GUE does not "permit encap of
arbitrary IP protocols" as claimed in the motivation section.
Further, why will GUE implementations be updated with logic to
understand NewSexyTP any faster than the ECMP code in general-purpose
switches and routers? One GUE implementation might be updated, but other
developers might not so diligently track the latest transport protocols.
One cannot even really argue that the ECMP code in switches and routers
is implemented in hardware, so it will be harder to change than GUE
code. Because the forwarding performance of GUE tunnel encap will need
to be no different to the performance of forwarding in general switches
and routers, so if hardware is necessary for one it will be necessary
for the other.
2) Transport encap.
If GUE encap is implemented as a centralized daemon process on a host or
centralized in a NIC, it will suffer from the same lack of forward
compatibility with new transport protocols as the network encap -
particularly if it is implemented in NIC hardware. Ie, if an operator
installs SexyNewTP in their OS, they will also have to wait for a GUE
update that supports SexyNewTP. This is the case with or without
connection semantics.
However, it might be possible to implement GUE transport encap
(including with connection semantics) so that each instance of a
protocol stack is associated with an instance of GUE (warning: I have no
idea yet whether this will be possible). In this case, each GUE instance
would consistently add the same outer port number to the inner protocol
instance it was associated with, without needing to understand how to
identify a flow ID in any particular protocol.
In summary, certainly for net encap, but possibly not for transport
encap, GUE only helps "atypical IP protocols" that a particular GUE
encap implementation already understands.
*4.10/ Flow entropy from encrypted data could weaken the crypto?**
*S.5.11.1
o If a node is encrypting a packet using ESP tunnel mode and GUE
encapsulation, the flow entropy could be based on the contents
of clear-text packet. For instance, a canonical five-tuple hash
for a TCP/IP packet could be used.
I'm not a crypto expert, but it sounds dangerous to take some clear-text
from a known position in the data, hash it with a function that is not
strongly one-way, then send this hash along with the cipher text.
I think the SPI can be used as a unique consistent per-flow value, can't
it? The SPI has been suitably randomised so that it reveals nothing
about the flow ID.
*4.11/ No need to constrain flow entropy distribution**
*S.5.11.2
o The flow entropy should have a uniform distribution across
encapsulated flows.
Equal distribution of flows is not necessarily appropriate for all
scenarios. Flows have a distribution of sizes, and altho ECMP is
generally done randomly, an operator might want to (somehow) bias the
hash algorithm to allow for the flows with the highest rate, which might
otherwise unbalance the load. See for instance:
"Engineered Elephant Flows for Boosting Application Performance in
Large-Scale CLOS Networks
<https://www.broadcom.com/collateral/wp/OF-DPA-WP102-R.pdf>" Broadcom
White Paper (March 2014)
*4.12/ No need to constrain flow entropy interpretation**
*
Decapsulators, or any networking devices, should not attempt to
interpret flow entropy as anything more than an opaque value.
This seems unnecessarily constraining. This might not be a good idea,
but if someone finds a use for it, there's no need to stop them - if
it's useful they'll ignore you anyway, so why bother saying it? Perhaps
you intended to explain why doing this could be problematic, rather than
precluding it?
_*5/ SECURITY*__*
*_
*5.1/ Addresses that are both visible and hidden? Have your GUE and eat
it too?**
*
S.7. In the following sentence,
Existing network security
mechanisms, such as address spoofing detection, DDOS mitigation, and
transparent encrypted tunnels can be applied to GUE packets.
This should point out that an existing set of address spoofing detection
rules would not work with GUE. I think you meant that existing rules and
mechanisms could be modified to check the packets encapsulated by GUE
without using radically new techniques.
However, if GUE is in network encap mode and it encrypts the IP headers
of the inner packets, address spoofing detection and DDoS mitigation
will not be possible over the length of the GUE tunnel. You cannot both
claim that GUE can hide information, and that GUE allows existing
security techniques to work that rely on access to the hidden information.
*5.2/ How can the Security option protect a UDP/GUE header from being
moved or removed?**
*
The Security option is "used to provide integrity and authentication of
the GUE header."
I assume you envisage this would be complemented by other authentication
techniques such as IPsec AH to provide integrity and authentication of
the rest of the packet.
However, it occurs to me that the two together do not protect the
integrity of the /structure/ of the packet as a whole (whether network
or transport encap). An on-path attacker could still move the UDP/GUE
header within the packet (it might be possible to construct a valid
packet with altered semantics), or remove the UDP/GUE header completely.
I can't immediately think whether any damage could be done with such an
attack, or how to prevent it. However, I'm sure there will be a crypto
expert for whom this is not a new problem.
Also, the 32B max length of the security option is insufficient. I
looked for a MAC protocol where a larger field is needed, and the first
one I picked required a larger field: RFC4383 "TESLA in Secure RTP"
requires 34B, and that's just for the default sizes, not even the
maximum. I picked TESLA because I knew each datagram needs a lot of
authentication space. TESLA provides multicast message authentication,
so as well as a key index and a MAC, each packet reveals a continually
changing key.
*5.3/ What happens when a port scan sends a datagram to port 6080?**
*
When a port scan (that doesn't necessarily know about GUE) sends a
datagram to port 6080, if the datagram has a body, and the body starts
with a zero bit, the GUE daemon will start processing it.
If the first 4 octets happen (randomly) to be set to values that would
be a valid GUE header (see S.5.4), it will be decapsulated and forwarded
to a protocol handler.
Not a show-stopper, but worth documenting?
*5.4/ Firewalls will still block new/atypical protocols**
*Few firewalls allow incoming UDP. So GUE will not enable deployment of
servers using atypical/new protocols, which will still face a deployment
problem.
If a firewall opens a pin-hole to allow incoming UDP to access the
well-known GUE port it would allow attackers to reach servers of any
protocol while bypassing the firewall. E.g. an attacker could access a
TCP-server by encapsulating TCP in GUE in order to bypass the firewall.
Therefore, a firewall will only open a pin-hole to a GUE server, if it
also inspects the packet encapsulated by GUE and applies all its normal
rules to that as well.
This is why I have said elsewhere that the draft should state that
firewall bypass by new/atypical protocols is a non-goal of GUE.
*5.5/ Transport Encap: Two Passes through a Local Firewall?**
*
GUE in transport mode resubmits the encapsulated packet to the host's IP
stack. But it needs to make sure it re-injects the packet at the correct
point in relation to any local firewall.
* If the firewall includes rules to inspect the packet encapsulated with
GUE (as discussed in the previous point), it would make sense to
re-submit the packet above the local firewall.
* If not, GUE should resubmit the packet so that it passes through the
local firewall again.
The latter mode would make more sense if GUE was also decrypting the
inner packet. So, rather than have two options, a local firewall could
work co-operatively with GUE in transport mode, so it doesn't have to
inspect the inner in both passes.
*6/ Implementation**
*
*6.1/ Practical Large Receive Offload Requirements**
*Appendix A.4 says:
The conservative approach to supporting LRO for GUE would be to
assign packets to the same flow only if they have identical five-
tuple and were encapsulated the same way. That is the outer IP
addresses, the outer UDP ports, GUE protocol, GUE flags and fields,
and inner five tuple are all identical.
Rant: It is sad if such a conservative approach to LRO is still
necessary. Any API to LRO hardware needs to be able to be given the
locations of certain header fields that are deliberately intended to
vary, so it can offer the facility to separately report these for each
packet. A MAC of the encapsulating headers is a good case in point. ECN
is an even better example of a varying field, because it has been a
standard part of the IP header since 2001, long before LRO hardware was
designed.
--
________________________________________________________________
Bob Briscoe http://bobbriscoe.net/
_______________________________________________
nvo3 mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/nvo3