In followup to my talk in the TRON workshop, I have opened the following Issue 
proposing to make unencrypted headers optional, as per my earlier E-mails on 
Dec 1 and 12:

        https://github.com/tlswg/tls13-spec/issues/422
        Add encrypted NextRecordLength field to make next record's unencrypted 
header optional

The precise (1- or 2-byte) encoding I leave for further discussion.

Thanks
Bryan

> On Dec 12, 2015, at 2:29 AM, Bryan A Ford <brynosau...@gmail.com> wrote:
> 
> Since a lot of the skepticism toward my encrypted-headers proposal
> constituted worries whether the benefits are worth the implementation
> cost/complexity, I decided to implement it and find out what that cost
> and complexity actually is.  See this github repo:
> 
>       https://github.com/bford/nss
> 
> Here you'll find a snapshot of Mozilla NSS/NSPR, with three branches:
> (a) current baseline version supporting only TLS 1.2 records; (b) a
> version that adds support for the currently-specified TLS 1.3 record
> format with the encrypted content-type trailer and padding within the
> AEAD; and (c) a further extension to support optional "headerless"
> records.  This doesn't implement all the other new stuff in TLS 1.3 such
> as negotiation, 0-RTT, etc. - only the record-layer changes.
> 
> ---
> Observations from implementing TLS 1.3 records:
> 
> The baseline...master diff (https://github.com/bford/nss/pull/1/files)
> represents the changes that appear needed to get from TLS 1.2 records to
> TLS 1.3 record format, with padding of records to a constant size for
> traffic analysis protection.  I set the fixed padded record length
> (TLS13_PAD_FRAGMENT_LENGTH) to 256 bytes "just because", but change it
> to whatever you like; I don't have any fish to fry regarding the
> specific "best" value.  512 and 1024 also seem like reasonable choices,
> with obvious tradeoffs between wasted bytes padding small messages
> versus the bandwidth and processing overheads of fewer bytes per record.
> 
> Measuring implementation complexity simplistically by line count, this
> diff constitutes +92-14=78 lines (92 added, 14 removed) - nothing too
> serious.  Subjectively, the main implementation came from adding the
> 1-byte internal content-type trailer in the first place, which means
> having to copy the input cleartext into the write buffer in order to
> fiddle with it before passing it to the AEAD encryptor.  Thus, because
> of this 1-byte content-type trailer (even with no padding), we can no
> longer just encrypt directly from the caller-provided input into the
> ciphertext buffer but have to copy the cleartext first, unless the AEAD
> API supports scatter/gather (which NSS's doesn't and I expect most
> probably won't).
> 
> This doesn't seem like a big problem to me and I definitely consider the
> benefits of the encrypted content type and record padding to be worth
> this minor cost.  And in practice I doubt it'll cause any actual
> noticeable performance degradation because a
> copy-A-to-B-then-encrypt-in-place-at-B is going to have a pretty similar
> cache footprint as an encrypt-from-A-to-B.
> 
> TLS 1.3 also already changes the way the pseudo-header is calculated for
> MAC purposes; I didn't yet fully implement those changes, but did
> already need to move the pseudo-header calculation on the send side
> until a bit later when the length of the ciphertext is known.
> 
> ---
> Headerless records extension:
> 
> The other pull request (https://github.com/bford/nss/pull/2/files)
> represents the further delta needed to implement a further simplified
> version of my last "encrypted headers" proposal, which in this
> incarnation becomes "headerless records".  Since TLS 1.3 is already
> adding a 1-byte mandatory encrypted trailer within each record (the
> encrypted content-type), I simply extended this to a 3-byte trailer, in
> which the first two bytes indicate:
> 
> - If zero, the next record (following this one) has the usual 5-byte TLS
> header and its length is defined by that header as usual.
> - If nonzero, the next record has *no* TLS header at all, and its length
> is defined by this value.
> 
> By combining the "next-record-length" into the encrypted trailer that
> TLS 1.3 is adding anyway, the changes required are pretty minimal.  By
> my count, this is a delta of +40-8=32 lines, which at least to me seems
> pretty insignificant in terms of implementation complexity.  And
> implementations could be even simpler by not implementing the send-side
> logic at all and simply setting all next-record-length fields in the
> trailers to zero, attaching headers to all records as before.  The only
> minor point of "implementation pain" is the need to add the appropriate
> inter-record state variables (writeNextLength and readNextLength), but
> the handling of these are not fundamentally any different from the state
> needed to keep track of sequence numbers across records for example.
> 
> Making headerless records optional, selected by the trailer in the prior
> record as defined above, offers several nice benefits:
> 
> - It address the concerns that have been raised (though unsubstantiated
> so far with any concrete evidence) about breaking middleboxes that want
> to parse traditional TLS record streams.  TLS implementations that are
> paranoid to this degree about breaking middleboxes can simply always set
> the next-header-length field in the trailer to 0 and send cleartext
> record headers for every TLS record just as before.
> 
> - We don't need to do anything special to handle the "first record"
> case: i.e., we neither need to specify a standard "first record length"
> nor add a first-record-length field to one of the negotiation packets.
> Instead, the first AEAD-encrypted record is simply transmitted with a
> 5-byte header as usual, but the sender can omit the headers from
> subsequent records if it chooses to.
> 
> - A TLS 1.3 implementation that doesn't want to bother "predicting" or
> "committing to" a next-record-length beyond a SSL_Write() boundary can
> simply set the next-record-length field to zero in the last record of
> the current write, so the first record in the next SSL_Write() will
> including a header determining its size as before.  This isn't ideal in
> terms of traffic analysis protection, but it's an implementation option.
> 
> - FWIW, when the sender transmits headerless records, the encoding saves
> 2 bytes per record with respect to TLS 1.2 (saving the 5-byte cleartext
> header but adding the 3-byte encrypted trailer).
> 
> ---
> Implications on padded record transmission:
> 
> Finally, while implementing this extension I realized a further general
> benefit of headerless transmission: it can make the use of padding for
> traffic analysis protection more bandwidth-efficient.
> 
> With cleartext headers, to achieve the traffic analysis benefits of
> padding we must ensure that every transmitted record has *exactly* the
> same ciphertext length, since any variation will produce a readily
> fingerprintable pattern in cleartext.  This creates tradeoffs, as noted
> above.  Padding to a smaller fixed size (e.g., 256 bytes) adds less
> bandwidth overhead to small messages such as typical HTTP requests but
> incurs the cost of a MAC tag, nonce, TLS headers/trailers etc once every
> 256 bytes - for AES-GCM this is about an 11% bandwidth overhead with
> 256-byte records.  Padding to a larger fixed size (e.g., 1024) reduces
> the bandwidth overhead for bulk data transmission such as large HTTP
> responses (e.g., to about 3% overhead with 1024-byte records), at a cost
> of adding a lot of bandwidth overhead to tiny HTTP requests or
> status-indication responses, which are also common.
> 
> With headerless records, in contrast, we can pick a relatively small
> ciphertext length padding granularity (e.g., 256 bytes), but then the
> sender can transmit records whose sizes are any *multiple* of this
> granularity, because N consecutive 256-byte ciphertexts are now
> cryptographically indistinguishable from a single N*256-byte ciphertext.
> Thus, we get the bandwidth-efficiency benefits of small records by
> avoiding the need to add too much padding to small messages, plus the
> bandwidth-efficiency benefits of larger records for bulk transmission.
> For example, we can send a large message mostly consisting of
> ~16384-byte records, each containing only one MAC tag and internal TLS
> 1.3 trailer (though an eavesdropper won't know that), reducing the
> bandwidth overhead in the AES-GCM case from 11% to 0.2%.
> 
> There are two minor caveats to this:
> 
> - In the specific case of AES-GCM, it looks like TLS 1.2 uses AES-GCM
> with explicitly transmitted nonces, which are clearly distinguishable
> from random values and hence will break the above benefits.  The
> solution is simple, however: simply make TLS 1.3 use AES-GCM (and other
> AEAD schemes) without explicit nonces.  These nonces can be calculated
> implicitly by sender and receiver just as easily without explicit
> transmission, and all the necessary integrity-checking happens via the
> "additional_data" pseudo-header anyway.  (In fact perhaps TLS 1.3
> already eliminates the explicit nonce - I don't see any mention of it in
> the new record spec anyway, the only question is whether it's still in
> the AES-GCM-specific specs, which I haven't looked at closely and don't
> know if they'll be updated wrt TLS 1.3 or not.)
> 
> - Even if N 256-byte ciphertexts in a TLS 1.3 stream are
> cryptographically indistinguishable from one N*256-byte ciphertext, if
> the TLS sender implementation transmits these in a single OS write(),
> the OS's TCP stack may (or may not, depending on circumstances) still
> reveal this difference by segmenting transmitted TCP segments
> differently.  There are several obvious solutions to this, however:
> 
>       (1) The TLS implementation could prepare the N*256-byte ciphertext but
> then transmit it using N separate 256-byte write() calls to the
> underlying TCP stack.  This might (again depending on the TCP stack)
> increase the TCP-level overhead by sending many smaller-than-necessary
> TCP segments, but is a fully OS-independent solution, and we still save
> the TLS-level bandwidth overhead (one MAC tag etc rather than N).
> 
>       (2) Better, at least on Linux systems, the TLS implementation could
> simply enable the TCP_CORK socket option, causing the kernel to delay
> the transmission of incomplete (less-than-MTU-sized) TCP segments
> slightly in hopes of "filling" them.  This way, the TLS sender will
> always produce continuous streams of MTU-sized TCP segments for all
> "bursts" of transmission of any number of TLS records.  The downside is
> that the kernel-imposed delay can add a bit of latency (I understand
> around 200ms) to the transmission of the very last, incomplete segment
> in a burst, thus perhaps adding up to 200ms to the total
> request/response latency of a round-trip interactive exchange.
> 
>       (3) Still better, again on Linux systems, the TLS implementation could
> send records using send() instead of write(), and set the MSG_MORE flag
> on all records except for the last record just before the connection
> goes idle (no more data to transmit).  Figuring  out when a connection
> is "going idle" may require some heuristics or hints from the
> application, but can ensure that bursts are transmitted as a fully
> uniform series of MTU-size segments without incurring any added delay at
> the end.
> 
> ---
> OK, congratulations and thanks to anyone who persisted through all that.
> I hope this will help understand the implementation complexity and
> tradeoffs both of the currently-specified TLS 1.3 record layer and the
> proposed headerless records features.  Comments?
> 
> Thanks
> Bryan
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
TLS mailing list
TLS@ietf.org
https://www.ietf.org/mailman/listinfo/tls

Reply via email to