In followup to my talk in the TRON workshop, I have opened the following Issue proposing to make unencrypted headers optional, as per my earlier E-mails on Dec 1 and 12:
https://github.com/tlswg/tls13-spec/issues/422 Add encrypted NextRecordLength field to make next record's unencrypted header optional The precise (1- or 2-byte) encoding I leave for further discussion. Thanks Bryan > On Dec 12, 2015, at 2:29 AM, Bryan A Ford <brynosau...@gmail.com> wrote: > > Since a lot of the skepticism toward my encrypted-headers proposal > constituted worries whether the benefits are worth the implementation > cost/complexity, I decided to implement it and find out what that cost > and complexity actually is. See this github repo: > > https://github.com/bford/nss > > Here you'll find a snapshot of Mozilla NSS/NSPR, with three branches: > (a) current baseline version supporting only TLS 1.2 records; (b) a > version that adds support for the currently-specified TLS 1.3 record > format with the encrypted content-type trailer and padding within the > AEAD; and (c) a further extension to support optional "headerless" > records. This doesn't implement all the other new stuff in TLS 1.3 such > as negotiation, 0-RTT, etc. - only the record-layer changes. > > --- > Observations from implementing TLS 1.3 records: > > The baseline...master diff (https://github.com/bford/nss/pull/1/files) > represents the changes that appear needed to get from TLS 1.2 records to > TLS 1.3 record format, with padding of records to a constant size for > traffic analysis protection. I set the fixed padded record length > (TLS13_PAD_FRAGMENT_LENGTH) to 256 bytes "just because", but change it > to whatever you like; I don't have any fish to fry regarding the > specific "best" value. 512 and 1024 also seem like reasonable choices, > with obvious tradeoffs between wasted bytes padding small messages > versus the bandwidth and processing overheads of fewer bytes per record. > > Measuring implementation complexity simplistically by line count, this > diff constitutes +92-14=78 lines (92 added, 14 removed) - nothing too > serious. Subjectively, the main implementation came from adding the > 1-byte internal content-type trailer in the first place, which means > having to copy the input cleartext into the write buffer in order to > fiddle with it before passing it to the AEAD encryptor. Thus, because > of this 1-byte content-type trailer (even with no padding), we can no > longer just encrypt directly from the caller-provided input into the > ciphertext buffer but have to copy the cleartext first, unless the AEAD > API supports scatter/gather (which NSS's doesn't and I expect most > probably won't). > > This doesn't seem like a big problem to me and I definitely consider the > benefits of the encrypted content type and record padding to be worth > this minor cost. And in practice I doubt it'll cause any actual > noticeable performance degradation because a > copy-A-to-B-then-encrypt-in-place-at-B is going to have a pretty similar > cache footprint as an encrypt-from-A-to-B. > > TLS 1.3 also already changes the way the pseudo-header is calculated for > MAC purposes; I didn't yet fully implement those changes, but did > already need to move the pseudo-header calculation on the send side > until a bit later when the length of the ciphertext is known. > > --- > Headerless records extension: > > The other pull request (https://github.com/bford/nss/pull/2/files) > represents the further delta needed to implement a further simplified > version of my last "encrypted headers" proposal, which in this > incarnation becomes "headerless records". Since TLS 1.3 is already > adding a 1-byte mandatory encrypted trailer within each record (the > encrypted content-type), I simply extended this to a 3-byte trailer, in > which the first two bytes indicate: > > - If zero, the next record (following this one) has the usual 5-byte TLS > header and its length is defined by that header as usual. > - If nonzero, the next record has *no* TLS header at all, and its length > is defined by this value. > > By combining the "next-record-length" into the encrypted trailer that > TLS 1.3 is adding anyway, the changes required are pretty minimal. By > my count, this is a delta of +40-8=32 lines, which at least to me seems > pretty insignificant in terms of implementation complexity. And > implementations could be even simpler by not implementing the send-side > logic at all and simply setting all next-record-length fields in the > trailers to zero, attaching headers to all records as before. The only > minor point of "implementation pain" is the need to add the appropriate > inter-record state variables (writeNextLength and readNextLength), but > the handling of these are not fundamentally any different from the state > needed to keep track of sequence numbers across records for example. > > Making headerless records optional, selected by the trailer in the prior > record as defined above, offers several nice benefits: > > - It address the concerns that have been raised (though unsubstantiated > so far with any concrete evidence) about breaking middleboxes that want > to parse traditional TLS record streams. TLS implementations that are > paranoid to this degree about breaking middleboxes can simply always set > the next-header-length field in the trailer to 0 and send cleartext > record headers for every TLS record just as before. > > - We don't need to do anything special to handle the "first record" > case: i.e., we neither need to specify a standard "first record length" > nor add a first-record-length field to one of the negotiation packets. > Instead, the first AEAD-encrypted record is simply transmitted with a > 5-byte header as usual, but the sender can omit the headers from > subsequent records if it chooses to. > > - A TLS 1.3 implementation that doesn't want to bother "predicting" or > "committing to" a next-record-length beyond a SSL_Write() boundary can > simply set the next-record-length field to zero in the last record of > the current write, so the first record in the next SSL_Write() will > including a header determining its size as before. This isn't ideal in > terms of traffic analysis protection, but it's an implementation option. > > - FWIW, when the sender transmits headerless records, the encoding saves > 2 bytes per record with respect to TLS 1.2 (saving the 5-byte cleartext > header but adding the 3-byte encrypted trailer). > > --- > Implications on padded record transmission: > > Finally, while implementing this extension I realized a further general > benefit of headerless transmission: it can make the use of padding for > traffic analysis protection more bandwidth-efficient. > > With cleartext headers, to achieve the traffic analysis benefits of > padding we must ensure that every transmitted record has *exactly* the > same ciphertext length, since any variation will produce a readily > fingerprintable pattern in cleartext. This creates tradeoffs, as noted > above. Padding to a smaller fixed size (e.g., 256 bytes) adds less > bandwidth overhead to small messages such as typical HTTP requests but > incurs the cost of a MAC tag, nonce, TLS headers/trailers etc once every > 256 bytes - for AES-GCM this is about an 11% bandwidth overhead with > 256-byte records. Padding to a larger fixed size (e.g., 1024) reduces > the bandwidth overhead for bulk data transmission such as large HTTP > responses (e.g., to about 3% overhead with 1024-byte records), at a cost > of adding a lot of bandwidth overhead to tiny HTTP requests or > status-indication responses, which are also common. > > With headerless records, in contrast, we can pick a relatively small > ciphertext length padding granularity (e.g., 256 bytes), but then the > sender can transmit records whose sizes are any *multiple* of this > granularity, because N consecutive 256-byte ciphertexts are now > cryptographically indistinguishable from a single N*256-byte ciphertext. > Thus, we get the bandwidth-efficiency benefits of small records by > avoiding the need to add too much padding to small messages, plus the > bandwidth-efficiency benefits of larger records for bulk transmission. > For example, we can send a large message mostly consisting of > ~16384-byte records, each containing only one MAC tag and internal TLS > 1.3 trailer (though an eavesdropper won't know that), reducing the > bandwidth overhead in the AES-GCM case from 11% to 0.2%. > > There are two minor caveats to this: > > - In the specific case of AES-GCM, it looks like TLS 1.2 uses AES-GCM > with explicitly transmitted nonces, which are clearly distinguishable > from random values and hence will break the above benefits. The > solution is simple, however: simply make TLS 1.3 use AES-GCM (and other > AEAD schemes) without explicit nonces. These nonces can be calculated > implicitly by sender and receiver just as easily without explicit > transmission, and all the necessary integrity-checking happens via the > "additional_data" pseudo-header anyway. (In fact perhaps TLS 1.3 > already eliminates the explicit nonce - I don't see any mention of it in > the new record spec anyway, the only question is whether it's still in > the AES-GCM-specific specs, which I haven't looked at closely and don't > know if they'll be updated wrt TLS 1.3 or not.) > > - Even if N 256-byte ciphertexts in a TLS 1.3 stream are > cryptographically indistinguishable from one N*256-byte ciphertext, if > the TLS sender implementation transmits these in a single OS write(), > the OS's TCP stack may (or may not, depending on circumstances) still > reveal this difference by segmenting transmitted TCP segments > differently. There are several obvious solutions to this, however: > > (1) The TLS implementation could prepare the N*256-byte ciphertext but > then transmit it using N separate 256-byte write() calls to the > underlying TCP stack. This might (again depending on the TCP stack) > increase the TCP-level overhead by sending many smaller-than-necessary > TCP segments, but is a fully OS-independent solution, and we still save > the TLS-level bandwidth overhead (one MAC tag etc rather than N). > > (2) Better, at least on Linux systems, the TLS implementation could > simply enable the TCP_CORK socket option, causing the kernel to delay > the transmission of incomplete (less-than-MTU-sized) TCP segments > slightly in hopes of "filling" them. This way, the TLS sender will > always produce continuous streams of MTU-sized TCP segments for all > "bursts" of transmission of any number of TLS records. The downside is > that the kernel-imposed delay can add a bit of latency (I understand > around 200ms) to the transmission of the very last, incomplete segment > in a burst, thus perhaps adding up to 200ms to the total > request/response latency of a round-trip interactive exchange. > > (3) Still better, again on Linux systems, the TLS implementation could > send records using send() instead of write(), and set the MSG_MORE flag > on all records except for the last record just before the connection > goes idle (no more data to transmit). Figuring out when a connection > is "going idle" may require some heuristics or hints from the > application, but can ensure that bursts are transmitted as a fully > uniform series of MTU-size segments without incurring any added delay at > the end. > > --- > OK, congratulations and thanks to anyone who persisted through all that. > I hope this will help understand the implementation complexity and > tradeoffs both of the currently-specified TLS 1.3 record layer and the > proposed headerless records features. Comments? > > Thanks > Bryan >
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls