Hi Jakub, Thanks for all the fixes. This version is much better.
On 5/22/2019 4:57 AM, Jakub Kicinski wrote: > Describe existing kernel TLS offload (added back in Linux 4.19) - > the mechanism, the expected behavior and the notable corner cases. > > This documentation is mostly targeting hardware vendors who want > to implement offload, to ensure consistency between implementations. > > v2: > - add emphasis around TLS_SW/TLS_HW/TLS_HW_RECORD; > - remove mentions of ongoing work (Boris); > - split the flow of data in SW vs. HW cases in TX overview > (Boris); > - call out which fields are updated by the device and which > are filled by the stack (Boris); > - move error handling into it's own section (Boris); > - add more words about fallback (Boris); > - note that checksum validation is required (Alexei); > - note that drivers shouldn't pay attention to the TLS > device features. > > Signed-off-by: Jakub Kicinski <jakub.kicin...@netronome.com> > Acked-by: Dave Watson <davejwat...@fb.com> > Acked-by: Alexei Starovoitov <a...@kernel.org> > --- > Documentation/networking/index.rst | 1 + > .../networking/tls-offload-layers.svg | 1 + > .../networking/tls-offload-reorder-bad.svg | 1 + > .../networking/tls-offload-reorder-good.svg | 1 + > Documentation/networking/tls-offload.rst | 482 ++++++++++++++++++ > Documentation/networking/tls.rst | 2 + > 6 files changed, 488 insertions(+) > create mode 100644 Documentation/networking/tls-offload-layers.svg > create mode 100644 Documentation/networking/tls-offload-reorder-bad.svg > create mode 100644 Documentation/networking/tls-offload-reorder-good.svg > create mode 100644 Documentation/networking/tls-offload.rst > > +Error handling > +============== > + > +TX > +-- > + > +Packets may be redirected or rerouted by the stack to a different > +device than the selected TLS offload device. The stack will handle > +such condition using the :c:func:`sk_validate_xmit_skb` helper > +(TLS offload code installs :c:func:`tls_validate_xmit_skb` at this hook). > +Offload maintains information about all records until the data is > +fully acknowledged, so if skbs reach the wrong device they can be handled > +by software fallback. > + > +Any device TLS offload handling error on the transmission side must result > +in the packet being dropped. For example if a packet got out of order > +due to a bug in the stack or the device, reached the device and can't > +be encrypted such packet must be dropped. > + > +RX > +-- > + > +If the device encounters any problems with TLS offload on the receive > +side it should pass the packet to the host's networking stack as it was > +received on the wire. > + > +For example authentication failure for any record in the segment should > +result in passing the unmodified packet to the software fallback. This means > +packets should not be modified "in place". Splitting segments to handle > partial > +decryption is not advised. In other words either all records in the packet > +had been handled successfully and authenticated or the packet has to be > passed > +to the host's stack as it was on the wire (recovering original packet in the > +driver if device provides precise error is sufficient). > + > +The Linux networking stack does not provide a way of reporting per-packet > +decryption and authentication errors, packets with errors must simply not > +have the :c:member:`decrypted` mark set. > + > +A packet should also not be handled by the TLS offload if it contains > +incorrect checksums. > + In the future, we might introduce some changes here to avoid the store-and-forward required to comply with providing the original packet on error. > +Performance metrics > +=================== > + > +TLS offload can be characterized by the following basic metrics: > + > + * max connection count > + * connection installation rate > + * connection installation latency > + * total cryptographic performance > + > +Note that each TCP connection requires a TLS session in both directions, > +the performance may be reported treating each direction separately. > + > +Max connection count > +-------------------- > + > +The number of connections device can support can be exposed via > +``devlink resource`` API. This is future changes, let's document when we implement this. > + > +Total cryptographic performance > +------------------------------- > + > +Offload performance may depend on segment and record size. > + > +Overload of the cryptographic subsystem of the device should not have > +significant performance impact on non-offloaded streams. > + > +Statistics > +========== > + > +Following minimum set of TLS-related statistics should be reported > +by the driver: > + > + * ``rx_tls_decrypted`` - number of successfully decrypted TLS segments > + * ``tx_tls_encrypted`` - number of in-order TLS segments passed to device > + for encryption > + * ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream > + but did not arrive in the expected order > + * ``tx_tls_drop_no_sync_data`` - number of TX packets dropped because > + they arrived out of order and associated record could not be found > + (see also :ref:`pre_tls_data`) > + > +Notable corner cases, exceptions and additional requirements > +============================================================ > + > +.. _5tuple_problems: > + > +5-tuple matching limitations > +---------------------------- > + > +The device can only recognize received packets based on the 5-tuple > +of the socket. Current ``ktls`` implementation will not offload sockets > +routed through software interfaces such as those used for tunneling > +or virtual networking. However, many packet transformations performed > +by the networking stack (most notably any BPF logic) do not require > +any intermediate software device, therefore a 5-tuple match may > +consistently miss at the device level. In such cases the device > +should still be able to perform TX offload (encryption) and should > +fallback cleanly to software decryption (RX). > + > +Out of order > +------------ > + > +Introducing extra processing in NICs should not cause packets to be > +transmitted or received out of order, for example pure ACK packets > +should not be reordered with respect to data segments. > + > +Ingress reorder > +--------------- > + > +A device is permitted to perform packet reordering for consecutive > +TCP segments (i.e. placing packets in the correct order) but any form > +of additional buffering is disallowed. > + > +Coexistence with standard networking offload features > +----------------------------------------------------- > + > +Offloaded ``ktls`` sockets should support standard TCP stack features > +transparently. Enabling device TLS offload should not cause any difference > +in packets as seen on the wire. > + > +Transport layer transparency > +---------------------------- > + > +The device should not modify any packet headers for the purpose > +of the simplifying TLS offload. > + > +The device should not depend on any packet headers beyond what is strictly > +necessary for TLS offload. > + > +Segment drops > +------------- > + > +Dropping packets is acceptable only in the event of catastrophic > +system errors and should never be used as an error handling mechanism > +in cases arising from normal operation. In other words, reliance > +on TCP retransmissions to handle corner cases is not acceptable. > + > +TLS device features > +------------------- > + > +Drivers should ignore the changes to TLS the device feature flags. > +These flags will be acted upon accordingly by the core ``ktls`` code. > +TLS device feature flags only control adding of new TLS connection > +offloads, old connections will remain active after flags are cleared. > + > +Known bugs > +========== > + > +skb_orphan() leaks clear text > +----------------------------- > + > +Currently drivers depend on the :c:member:`sk` member of > +:c:type:`struct sk_buff <sk_buff>` to identify segments requiring > +encryption. Any operation which removes or does not preserve the socket > +association such as :c:func:`skb_orphan` or :c:func:`skb_clone` > +will cause the driver to miss the packets and lead to clear text leaks. > + > +Redirects leak clear text > +------------------------- > + > +In the RX direction, if segment has already been decrypted by the device > +and it gets redirected or mirrored - clear text will be transmitted out. Not sure if it is a bug or a feature as it needs administrator intervention ;) Overall, looks good to me. Acked-by: Boris Pismenny <bor...@mellanox.com>