Hi Dave, Jakub This is a re-spin of the previous kernel cycle mlx5 rx tls submission, From Tariq and Boris.
Changes from previous iteration: 1) Better handling of error flows in the resyc procedure. 2) An improved TLS device API for Asynchronous Resync to replace "force resync" For this Tariq and Boris revert the old "force resync" API then add the new one, patch: ('Revert "net/tls: Add force_resync for driver resync"') Since there are no users for the "force resync" API it might be a good idea to also take this patch to net. For more information please see tag log below. Please pull and let me know if there is any problem. Please note that the series starts with a merge of mlx5-next branch, to resolve and avoid dependency with rdma tree. Thanks, Saeed. --- The following changes since commit e396eccf0f1a6621d235340260f4d1f292de74f9: Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux (2020-06-27 14:00:13 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-tls-2020-06-26 for you to fetch changes up to a29074367b347af9e19d36522f7ad9a7db4b9c28: net/mlx5e: kTLS, Improve rx handler function call (2020-06-27 14:00:25 -0700) ---------------------------------------------------------------- mlx5-tls-2020-06-26 1) Improve hardware layouts and structure for kTLS support 2) Generalize ICOSQ (Internal Channel Operations Send Queue) Due to the asynchronous nature of adding new kTLS flows and handling HW asynchronous kTLS resync requests, the XSK ICOSQ was extended to support generic async operations, such as kTLS add flow and resync, in addition to the existing XSK usages. 3) kTLS hardware flow steering and classification: The driver already has the means to classify TCP ipv4/6 flows to send them to the corresponding RSS HW engine, as reflected in patches 3 through 5, the series will add a steering layer that will hook to the driver's TCP classifiers and will match on well known kTLS connection, in case of a match traffic will be redirected to the kTLS decryption engine, otherwise traffic will continue flowing normally to the TCP RSS engine. 3) kTLS add flow RX HW offload support New offload contexts post their static/progress params WQEs (Work Queue Element) to communicate the newly added kTLS contexts over the per-channel async ICOSQ. The Channel/RQ is selected according to the socket's rxq index. A new TLS-RX workqueue is used to allow asynchronous addition of steering rules, out of the NAPI context. It will be also used in a downstream patch in the resync procedure. Feature is OFF by default. Can be turned on by: $ ethtool -K <if> tls-hw-rx-offload on 4) Added mlx5 kTLS sw stats and new counters are documented in Documentation/networking/tls-offload.rst rx_tls_ctx - number of TLS RX HW offload contexts added to device for decryption. rx_tls_ooo - number of RX packets which were part of a TLS stream but did not arrive in the expected order and triggered the resync procedure. rx_tls_del - number of TLS RX HW offload contexts deleted from device (connection has finished). rx_tls_err - number of RX packets which were part of a TLS stream but were not decrypted due to unexpected error in the state machine. 5) Asynchronous RX resync a. The NIC driver indicates that it would like to resync on some TLS record within the received packet (P), but the driver does not know (yet) which of the TLS records within the packet. At this stage, the NIC driver will query the device to find the exact TCP sequence for resync (tcpsn), however, the driver does not wait for the device to provide the response. b. Eventually, the device responds, and the driver provides the tcpsn within the resync packet to KTLS. Now, KTLS can check the tcpsn against any processed TLS records within packet P, and also against any record that is processed in the future within packet P. The asynchronous resync path simplifies the device driver, as it can save bits on the packet completion (32-bit TCP sequence), and pass this information on an asynchronous command instead. Performance: CPU: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz, 24 cores, HT off NIC: ConnectX-6 Dx 100GbE dual port Goodput (app-layer throughput) comparison: +---------------+-------+-------+---------+ | # connections | 1 | 4 | 8 | +---------------+-------+-------+---------+ | SW (Gbps) | 7.26 | 24.70 | 50.30 | +---------------+-------+-------+---------+ | HW (Gbps) | 18.50 | 64.30 | 92.90 | +---------------+-------+-------+---------+ | Speedup | 2.55x | 2.56x | 1.85x * | +---------------+-------+-------+---------+ * After linerate is reached, diff is observed in CPU util ---------------------------------------------------------------- Boris Pismenny (3): net/mlx5e: Receive flow steering framework for accelerated TCP flows Revert "net/tls: Add force_resync for driver resync" net/tls: Add asynchronous resync Saeed Mahameed (1): net/mlx5e: API to manipulate TTC rules destinations Tariq Toukan (11): net/mlx5e: Turn XSK ICOSQ into a general asynchronous one net/mlx5e: Refactor build channel params net/mlx5e: Accel, Expose flow steering API for rules add/del net/mlx5e: kTLS, Improve TLS feature modularity net/mlx5e: kTLS, Use kernel API to extract private offload context net/mlx5e: kTLS, Add kTLS RX HW offload support net/mlx5e: kTLS, Add kTLS RX resync support net/mlx5e: kTLS, Add kTLS RX stats net/mlx5e: Increase Async ICO SQ size net/mlx5e: kTLS, Cleanup redundant capability check net/mlx5e: kTLS, Improve rx handler function call Documentation/networking/tls-offload.rst | 18 + drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 1 + drivers/net/ethernet/mellanox/mlx5/core/Makefile | 3 +- .../net/ethernet/mellanox/mlx5/core/accel/tls.h | 19 +- drivers/net/ethernet/mellanox/mlx5/core/en.h | 23 +- drivers/net/ethernet/mellanox/mlx5/core/en/fs.h | 26 +- .../net/ethernet/mellanox/mlx5/core/en/params.h | 22 +- drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 15 + .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.c | 53 +- .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 12 +- .../mellanox/mlx5/core/en_accel/en_accel.h | 20 + .../ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c | 400 ++++++++++++ .../ethernet/mellanox/mlx5/core/en_accel/fs_tcp.h | 27 + .../ethernet/mellanox/mlx5/core/en_accel/ktls.c | 123 ++-- .../ethernet/mellanox/mlx5/core/en_accel/ktls.h | 114 +--- .../ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c | 670 +++++++++++++++++++++ .../ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 204 ++++--- .../mellanox/mlx5/core/en_accel/ktls_txrx.c | 119 ++++ .../mellanox/mlx5/core/en_accel/ktls_txrx.h | 42 ++ .../mellanox/mlx5/core/en_accel/ktls_utils.h | 86 +++ .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c | 26 +- .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h | 14 +- .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 32 +- .../mellanox/mlx5/core/en_accel/tls_rxtx.h | 34 +- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 34 +- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 84 ++- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 68 ++- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 41 +- drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 39 ++ drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 25 + drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 1 - drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 12 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 3 +- include/net/tls.h | 34 +- net/tls/tls_device.c | 60 +- 36 files changed, 2054 insertions(+), 454 deletions(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/fs_tcp.h create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_txrx.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_txrx.h create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_utils.h