[TLS] Prototype of TLS 1.3 records, padding, and optional headerless records

Bryan A Ford Sat, 12 Dec 2015 02:29:39 -0800

Since a lot of the skepticism toward my encrypted-headers proposal
constituted worries whether the benefits are worth the implementation
cost/complexity, I decided to implement it and find out what that cost
and complexity actually is.  See this github repo:


        https://github.com/bford/nss

Here you'll find a snapshot of Mozilla NSS/NSPR, with three branches:
(a) current baseline version supporting only TLS 1.2 records; (b) a
version that adds support for the currently-specified TLS 1.3 record
format with the encrypted content-type trailer and padding within the
AEAD; and (c) a further extension to support optional "headerless"
records.  This doesn't implement all the other new stuff in TLS 1.3 such
as negotiation, 0-RTT, etc. - only the record-layer changes.

---
Observations from implementing TLS 1.3 records:

The baseline...master diff (https://github.com/bford/nss/pull/1/files)
represents the changes that appear needed to get from TLS 1.2 records to
TLS 1.3 record format, with padding of records to a constant size for
traffic analysis protection.  I set the fixed padded record length
(TLS13_PAD_FRAGMENT_LENGTH) to 256 bytes "just because", but change it
to whatever you like; I don't have any fish to fry regarding the
specific "best" value.  512 and 1024 also seem like reasonable choices,
with obvious tradeoffs between wasted bytes padding small messages
versus the bandwidth and processing overheads of fewer bytes per record.

Measuring implementation complexity simplistically by line count, this
diff constitutes +92-14=78 lines (92 added, 14 removed) - nothing too
serious.  Subjectively, the main implementation came from adding the
1-byte internal content-type trailer in the first place, which means
having to copy the input cleartext into the write buffer in order to
fiddle with it before passing it to the AEAD encryptor.  Thus, because
of this 1-byte content-type trailer (even with no padding), we can no
longer just encrypt directly from the caller-provided input into the
ciphertext buffer but have to copy the cleartext first, unless the AEAD
API supports scatter/gather (which NSS's doesn't and I expect most
probably won't).

This doesn't seem like a big problem to me and I definitely consider the
benefits of the encrypted content type and record padding to be worth
this minor cost.  And in practice I doubt it'll cause any actual
noticeable performance degradation because a
copy-A-to-B-then-encrypt-in-place-at-B is going to have a pretty similar
cache footprint as an encrypt-from-A-to-B.

TLS 1.3 also already changes the way the pseudo-header is calculated for
MAC purposes; I didn't yet fully implement those changes, but did
already need to move the pseudo-header calculation on the send side
until a bit later when the length of the ciphertext is known.

---
Headerless records extension:

The other pull request (https://github.com/bford/nss/pull/2/files)
represents the further delta needed to implement a further simplified
version of my last "encrypted headers" proposal, which in this
incarnation becomes "headerless records".  Since TLS 1.3 is already
adding a 1-byte mandatory encrypted trailer within each record (the
encrypted content-type), I simply extended this to a 3-byte trailer, in
which the first two bytes indicate:

- If zero, the next record (following this one) has the usual 5-byte TLS
header and its length is defined by that header as usual.
- If nonzero, the next record has *no* TLS header at all, and its length
is defined by this value.

By combining the "next-record-length" into the encrypted trailer that
TLS 1.3 is adding anyway, the changes required are pretty minimal.  By
my count, this is a delta of +40-8=32 lines, which at least to me seems
pretty insignificant in terms of implementation complexity.  And
implementations could be even simpler by not implementing the send-side
logic at all and simply setting all next-record-length fields in the
trailers to zero, attaching headers to all records as before.  The only
minor point of "implementation pain" is the need to add the appropriate
inter-record state variables (writeNextLength and readNextLength), but
the handling of these are not fundamentally any different from the state
needed to keep track of sequence numbers across records for example.

Making headerless records optional, selected by the trailer in the prior
record as defined above, offers several nice benefits:

- It address the concerns that have been raised (though unsubstantiated
so far with any concrete evidence) about breaking middleboxes that want
to parse traditional TLS record streams.  TLS implementations that are
paranoid to this degree about breaking middleboxes can simply always set
the next-header-length field in the trailer to 0 and send cleartext
record headers for every TLS record just as before.

- We don't need to do anything special to handle the "first record"
case: i.e., we neither need to specify a standard "first record length"
nor add a first-record-length field to one of the negotiation packets.
Instead, the first AEAD-encrypted record is simply transmitted with a
5-byte header as usual, but the sender can omit the headers from
subsequent records if it chooses to.

- A TLS 1.3 implementation that doesn't want to bother "predicting" or
"committing to" a next-record-length beyond a SSL_Write() boundary can
simply set the next-record-length field to zero in the last record of
the current write, so the first record in the next SSL_Write() will
including a header determining its size as before.  This isn't ideal in
terms of traffic analysis protection, but it's an implementation option.

- FWIW, when the sender transmits headerless records, the encoding saves
2 bytes per record with respect to TLS 1.2 (saving the 5-byte cleartext
header but adding the 3-byte encrypted trailer).

---
Implications on padded record transmission:

Finally, while implementing this extension I realized a further general
benefit of headerless transmission: it can make the use of padding for
traffic analysis protection more bandwidth-efficient.

With cleartext headers, to achieve the traffic analysis benefits of
padding we must ensure that every transmitted record has *exactly* the
same ciphertext length, since any variation will produce a readily
fingerprintable pattern in cleartext.  This creates tradeoffs, as noted
above.  Padding to a smaller fixed size (e.g., 256 bytes) adds less
bandwidth overhead to small messages such as typical HTTP requests but
incurs the cost of a MAC tag, nonce, TLS headers/trailers etc once every
256 bytes - for AES-GCM this is about an 11% bandwidth overhead with
256-byte records.  Padding to a larger fixed size (e.g., 1024) reduces
the bandwidth overhead for bulk data transmission such as large HTTP
responses (e.g., to about 3% overhead with 1024-byte records), at a cost
of adding a lot of bandwidth overhead to tiny HTTP requests or
status-indication responses, which are also common.

With headerless records, in contrast, we can pick a relatively small
ciphertext length padding granularity (e.g., 256 bytes), but then the
sender can transmit records whose sizes are any *multiple* of this
granularity, because N consecutive 256-byte ciphertexts are now
cryptographically indistinguishable from a single N*256-byte ciphertext.
 Thus, we get the bandwidth-efficiency benefits of small records by
avoiding the need to add too much padding to small messages, plus the
bandwidth-efficiency benefits of larger records for bulk transmission.
For example, we can send a large message mostly consisting of
~16384-byte records, each containing only one MAC tag and internal TLS
1.3 trailer (though an eavesdropper won't know that), reducing the
bandwidth overhead in the AES-GCM case from 11% to 0.2%.

There are two minor caveats to this:

- In the specific case of AES-GCM, it looks like TLS 1.2 uses AES-GCM
with explicitly transmitted nonces, which are clearly distinguishable
from random values and hence will break the above benefits.  The
solution is simple, however: simply make TLS 1.3 use AES-GCM (and other
AEAD schemes) without explicit nonces.  These nonces can be calculated
implicitly by sender and receiver just as easily without explicit
transmission, and all the necessary integrity-checking happens via the
"additional_data" pseudo-header anyway.  (In fact perhaps TLS 1.3
already eliminates the explicit nonce - I don't see any mention of it in
the new record spec anyway, the only question is whether it's still in
the AES-GCM-specific specs, which I haven't looked at closely and don't
know if they'll be updated wrt TLS 1.3 or not.)

- Even if N 256-byte ciphertexts in a TLS 1.3 stream are
cryptographically indistinguishable from one N*256-byte ciphertext, if
the TLS sender implementation transmits these in a single OS write(),
the OS's TCP stack may (or may not, depending on circumstances) still
reveal this difference by segmenting transmitted TCP segments
differently.  There are several obvious solutions to this, however:

        (1) The TLS implementation could prepare the N*256-byte ciphertext but
then transmit it using N separate 256-byte write() calls to the
underlying TCP stack.  This might (again depending on the TCP stack)
increase the TCP-level overhead by sending many smaller-than-necessary
TCP segments, but is a fully OS-independent solution, and we still save
the TLS-level bandwidth overhead (one MAC tag etc rather than N).

        (2) Better, at least on Linux systems, the TLS implementation could
simply enable the TCP_CORK socket option, causing the kernel to delay
the transmission of incomplete (less-than-MTU-sized) TCP segments
slightly in hopes of "filling" them.  This way, the TLS sender will
always produce continuous streams of MTU-sized TCP segments for all
"bursts" of transmission of any number of TLS records.  The downside is
that the kernel-imposed delay can add a bit of latency (I understand
around 200ms) to the transmission of the very last, incomplete segment
in a burst, thus perhaps adding up to 200ms to the total
request/response latency of a round-trip interactive exchange.

        (3) Still better, again on Linux systems, the TLS implementation could
send records using send() instead of write(), and set the MSG_MORE flag
on all records except for the last record just before the connection
goes idle (no more data to transmit).  Figuring  out when a connection
is "going idle" may require some heuristics or hints from the
application, but can ensure that bursts are transmitted as a fully
uniform series of MTU-size segments without incurring any added delay at
the end.

---
OK, congratulations and thanks to anyone who persisted through all that.
 I hope this will help understand the implementation complexity and
tradeoffs both of the currently-specified TLS 1.3 record layer and the
proposed headerless records features.  Comments?

Thanks
Bryan

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
TLS mailing list
TLS@ietf.org
https://www.ietf.org/mailman/listinfo/tls

[TLS] Prototype of TLS 1.3 records, padding, and optional headerless records

Reply via email to