On Wed, Aug 7, 2019 at 2:06 AM Jakub Kicinski <jakub.kicin...@netronome.com> wrote: > > sk_validate_xmit_skb() and drivers depend on the sk member of > struct sk_buff to identify segments requiring encryption. > Any operation which removes or does not preserve the original TLS > socket such as skb_orphan() or skb_clone() will cause clear text > leaks. > > Make the TCP socket underlying an offloaded TLS connection > mark all skbs as decrypted, if TLS TX is in offload mode. > Then in sk_validate_xmit_skb() catch skbs which have no socket > (or a socket with no validation) and decrypted flag set. > > Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and > sk->sk_validate_xmit_skb are slightly interchangeable right now, > they all imply TLS offload. The new checks are guarded by > CONFIG_TLS_DEVICE because that's the option guarding the > sk_buff->decrypted member. > > Second, smaller issue with orphaning is that it breaks > the guarantee that packets will be delivered to device > queues in-order. All TLS offload drivers depend on that > scheduling property. This means skb_orphan_partial()'s > trick of preserving partial socket references will cause > issues in the drivers. We need a full orphan, and as a > result netem delay/throttling will cause all TLS offload > skbs to be dropped. > > Reusing the sk_buff->decrypted flag also protects from > leaking clear text when incoming, decrypted skb is redirected > (e.g. by TC). > > See commit 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect > through ULP") for justification why the internal flag is safe. > > v2: > - remove superfluous decrypted mark copy (Willem); > - remove the stale doc entry (Boris); > - rely entirely on EOR marking to prevent coalescing (Boris); > - use an internal sendpages flag instead of marking the socket > (Boris). > > Signed-off-by: Jakub Kicinski <jakub.kicin...@netronome.com> > --- > Documentation/networking/tls-offload.rst | 18 ------------------ > include/linux/skbuff.h | 8 ++++++++ > include/linux/socket.h | 3 +++ > include/net/sock.h | 10 +++++++++- > net/core/sock.c | 20 +++++++++++++++----- > net/ipv4/tcp.c | 3 +++ > net/ipv4/tcp_output.c | 3 +++ > net/tls/tls_device.c | 9 +++++++-- > 8 files changed, 48 insertions(+), 26 deletions(-)
> diff --git a/net/core/sock.c b/net/core/sock.c > index d57b0cc995a0..0f9619b0892f 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1992,6 +1992,20 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock > *sk) > } > EXPORT_SYMBOL(skb_set_owner_w); > > +static bool can_skb_orphan_partial(const struct sk_buff *skb) > +{ > +#ifdef CONFIG_TLS_DEVICE > + /* Drivers depend on in-order delivery for crypto offload, > + * partial orphan breaks out-of-order-OK logic. > + */ > + if (skb->decrypted) > + return false; > +#endif > + return (IS_ENABLED(CONFIG_INET) && > + skb->destructor == tcp_wfree) || Please add parentheses around IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree I was also surprised that this works when tcp_wfree is not defined if !CONFIG_INET. But apparently it does (at -O2?) :) > @@ -984,6 +984,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page > *page, int offset, > if (!skb) > goto wait_for_memory; > > +#ifdef CONFIG_TLS_DEVICE > + skb->decrypted = !!(flags & MSG_SENDPAGE_DECRYPTED); > +#endif Nothing is stopping userspace from passing this new flag. In send (tcp_sendmsg_locked) it is ignored. But can it reach do_tcp_sendpages through tcp_bpf_sendmsg? > skb_entail(sk, skb); > copy = size_goal; > } > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 6e4afc48d7bb..979520e46e33 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1320,6 +1320,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue > tcp_queue, > buff = sk_stream_alloc_skb(sk, nsize, gfp, true); > if (!buff) > return -ENOMEM; /* We'll just try again later. */ > + skb_copy_decrypted(buff, skb); This code has to copy timestamps, tx_flags, zerocopy state and now this in three locations. Eventually we'll want a single helper for all of them..