Since a lot of the skepticism toward my encrypted-headers proposal constituted worries whether the benefits are worth the implementation cost/complexity, I decided to implement it and find out what that cost and complexity actually is. See this github repo:
https://github.com/bford/nss Here you'll find a snapshot of Mozilla NSS/NSPR, with three branches: (a) current baseline version supporting only TLS 1.2 records; (b) a version that adds support for the currently-specified TLS 1.3 record format with the encrypted content-type trailer and padding within the AEAD; and (c) a further extension to support optional "headerless" records. This doesn't implement all the other new stuff in TLS 1.3 such as negotiation, 0-RTT, etc. - only the record-layer changes. --- Observations from implementing TLS 1.3 records: The baseline...master diff (https://github.com/bford/nss/pull/1/files) represents the changes that appear needed to get from TLS 1.2 records to TLS 1.3 record format, with padding of records to a constant size for traffic analysis protection. I set the fixed padded record length (TLS13_PAD_FRAGMENT_LENGTH) to 256 bytes "just because", but change it to whatever you like; I don't have any fish to fry regarding the specific "best" value. 512 and 1024 also seem like reasonable choices, with obvious tradeoffs between wasted bytes padding small messages versus the bandwidth and processing overheads of fewer bytes per record. Measuring implementation complexity simplistically by line count, this diff constitutes +92-14=78 lines (92 added, 14 removed) - nothing too serious. Subjectively, the main implementation came from adding the 1-byte internal content-type trailer in the first place, which means having to copy the input cleartext into the write buffer in order to fiddle with it before passing it to the AEAD encryptor. Thus, because of this 1-byte content-type trailer (even with no padding), we can no longer just encrypt directly from the caller-provided input into the ciphertext buffer but have to copy the cleartext first, unless the AEAD API supports scatter/gather (which NSS's doesn't and I expect most probably won't). This doesn't seem like a big problem to me and I definitely consider the benefits of the encrypted content type and record padding to be worth this minor cost. And in practice I doubt it'll cause any actual noticeable performance degradation because a copy-A-to-B-then-encrypt-in-place-at-B is going to have a pretty similar cache footprint as an encrypt-from-A-to-B. TLS 1.3 also already changes the way the pseudo-header is calculated for MAC purposes; I didn't yet fully implement those changes, but did already need to move the pseudo-header calculation on the send side until a bit later when the length of the ciphertext is known. --- Headerless records extension: The other pull request (https://github.com/bford/nss/pull/2/files) represents the further delta needed to implement a further simplified version of my last "encrypted headers" proposal, which in this incarnation becomes "headerless records". Since TLS 1.3 is already adding a 1-byte mandatory encrypted trailer within each record (the encrypted content-type), I simply extended this to a 3-byte trailer, in which the first two bytes indicate: - If zero, the next record (following this one) has the usual 5-byte TLS header and its length is defined by that header as usual. - If nonzero, the next record has *no* TLS header at all, and its length is defined by this value. By combining the "next-record-length" into the encrypted trailer that TLS 1.3 is adding anyway, the changes required are pretty minimal. By my count, this is a delta of +40-8=32 lines, which at least to me seems pretty insignificant in terms of implementation complexity. And implementations could be even simpler by not implementing the send-side logic at all and simply setting all next-record-length fields in the trailers to zero, attaching headers to all records as before. The only minor point of "implementation pain" is the need to add the appropriate inter-record state variables (writeNextLength and readNextLength), but the handling of these are not fundamentally any different from the state needed to keep track of sequence numbers across records for example. Making headerless records optional, selected by the trailer in the prior record as defined above, offers several nice benefits: - It address the concerns that have been raised (though unsubstantiated so far with any concrete evidence) about breaking middleboxes that want to parse traditional TLS record streams. TLS implementations that are paranoid to this degree about breaking middleboxes can simply always set the next-header-length field in the trailer to 0 and send cleartext record headers for every TLS record just as before. - We don't need to do anything special to handle the "first record" case: i.e., we neither need to specify a standard "first record length" nor add a first-record-length field to one of the negotiation packets. Instead, the first AEAD-encrypted record is simply transmitted with a 5-byte header as usual, but the sender can omit the headers from subsequent records if it chooses to. - A TLS 1.3 implementation that doesn't want to bother "predicting" or "committing to" a next-record-length beyond a SSL_Write() boundary can simply set the next-record-length field to zero in the last record of the current write, so the first record in the next SSL_Write() will including a header determining its size as before. This isn't ideal in terms of traffic analysis protection, but it's an implementation option. - FWIW, when the sender transmits headerless records, the encoding saves 2 bytes per record with respect to TLS 1.2 (saving the 5-byte cleartext header but adding the 3-byte encrypted trailer). --- Implications on padded record transmission: Finally, while implementing this extension I realized a further general benefit of headerless transmission: it can make the use of padding for traffic analysis protection more bandwidth-efficient. With cleartext headers, to achieve the traffic analysis benefits of padding we must ensure that every transmitted record has *exactly* the same ciphertext length, since any variation will produce a readily fingerprintable pattern in cleartext. This creates tradeoffs, as noted above. Padding to a smaller fixed size (e.g., 256 bytes) adds less bandwidth overhead to small messages such as typical HTTP requests but incurs the cost of a MAC tag, nonce, TLS headers/trailers etc once every 256 bytes - for AES-GCM this is about an 11% bandwidth overhead with 256-byte records. Padding to a larger fixed size (e.g., 1024) reduces the bandwidth overhead for bulk data transmission such as large HTTP responses (e.g., to about 3% overhead with 1024-byte records), at a cost of adding a lot of bandwidth overhead to tiny HTTP requests or status-indication responses, which are also common. With headerless records, in contrast, we can pick a relatively small ciphertext length padding granularity (e.g., 256 bytes), but then the sender can transmit records whose sizes are any *multiple* of this granularity, because N consecutive 256-byte ciphertexts are now cryptographically indistinguishable from a single N*256-byte ciphertext. Thus, we get the bandwidth-efficiency benefits of small records by avoiding the need to add too much padding to small messages, plus the bandwidth-efficiency benefits of larger records for bulk transmission. For example, we can send a large message mostly consisting of ~16384-byte records, each containing only one MAC tag and internal TLS 1.3 trailer (though an eavesdropper won't know that), reducing the bandwidth overhead in the AES-GCM case from 11% to 0.2%. There are two minor caveats to this: - In the specific case of AES-GCM, it looks like TLS 1.2 uses AES-GCM with explicitly transmitted nonces, which are clearly distinguishable from random values and hence will break the above benefits. The solution is simple, however: simply make TLS 1.3 use AES-GCM (and other AEAD schemes) without explicit nonces. These nonces can be calculated implicitly by sender and receiver just as easily without explicit transmission, and all the necessary integrity-checking happens via the "additional_data" pseudo-header anyway. (In fact perhaps TLS 1.3 already eliminates the explicit nonce - I don't see any mention of it in the new record spec anyway, the only question is whether it's still in the AES-GCM-specific specs, which I haven't looked at closely and don't know if they'll be updated wrt TLS 1.3 or not.) - Even if N 256-byte ciphertexts in a TLS 1.3 stream are cryptographically indistinguishable from one N*256-byte ciphertext, if the TLS sender implementation transmits these in a single OS write(), the OS's TCP stack may (or may not, depending on circumstances) still reveal this difference by segmenting transmitted TCP segments differently. There are several obvious solutions to this, however: (1) The TLS implementation could prepare the N*256-byte ciphertext but then transmit it using N separate 256-byte write() calls to the underlying TCP stack. This might (again depending on the TCP stack) increase the TCP-level overhead by sending many smaller-than-necessary TCP segments, but is a fully OS-independent solution, and we still save the TLS-level bandwidth overhead (one MAC tag etc rather than N). (2) Better, at least on Linux systems, the TLS implementation could simply enable the TCP_CORK socket option, causing the kernel to delay the transmission of incomplete (less-than-MTU-sized) TCP segments slightly in hopes of "filling" them. This way, the TLS sender will always produce continuous streams of MTU-sized TCP segments for all "bursts" of transmission of any number of TLS records. The downside is that the kernel-imposed delay can add a bit of latency (I understand around 200ms) to the transmission of the very last, incomplete segment in a burst, thus perhaps adding up to 200ms to the total request/response latency of a round-trip interactive exchange. (3) Still better, again on Linux systems, the TLS implementation could send records using send() instead of write(), and set the MSG_MORE flag on all records except for the last record just before the connection goes idle (no more data to transmit). Figuring out when a connection is "going idle" may require some heuristics or hints from the application, but can ensure that bursts are transmitted as a fully uniform series of MTU-size segments without incurring any added delay at the end. --- OK, congratulations and thanks to anyone who persisted through all that. I hope this will help understand the implementation complexity and tradeoffs both of the currently-specified TLS 1.3 record layer and the proposed headerless records features. Comments? Thanks Bryan
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls