Re: [RFC PATCH v3 4/8] vhost: annotate async accesses
On 4/11/22 13:00, David Marchand wrote: vq->async is initialised and must be accessed under vq->access_lock. Top level "_thread_unsafe" functions could be checked at runtime (clang provides a lock aware assert()-like check), but they are simply skipped because those functions are not called in-tree, and as a result, their annotations would not be tested. Signed-off-by: David Marchand --- lib/vhost/vhost.c | 14 +- lib/vhost/vhost.h | 2 +- lib/vhost/vhost_user.c | 2 ++ lib/vhost/virtio_net.c | 15 +++ 4 files changed, 27 insertions(+), 6 deletions(-) Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [RFC PATCH v3 5/8] vhost: annotate need reply handling
On 4/11/22 13:00, David Marchand wrote: When a reply from the slave is required (VHOST_USER_NEED_REPLY flag), a spinlock is taken before sending the message. This spinlock is released if an error occurs when sending the message, and once a reply is received. A problem is that this lock is taken under a branch and annotating conditionally held locks is not supported. The code seems currently correct and, while we may rework the code, it is easier to simply skip checks on slave_req_lock for those helpers. Signed-off-by: David Marchand --- lib/vhost/vhost_user.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c index ee276a28f1..d101d5072f 100644 --- a/lib/vhost/vhost_user.c +++ b/lib/vhost/vhost_user.c @@ -2854,6 +2854,7 @@ send_vhost_reply(struct virtio_net *dev, int sockfd, struct vhu_msg_context *ctx static int send_vhost_slave_message(struct virtio_net *dev, struct vhu_msg_context *ctx) + __rte_no_thread_safety_analysis { int ret; @@ -3165,6 +3166,7 @@ vhost_user_msg_handler(int vid, int fd) static int process_slave_message_reply(struct virtio_net *dev, const struct vhu_msg_context *ctx) + __rte_no_thread_safety_analysis { struct vhu_msg_context msg_reply; int ret; Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [RFC PATCH v3 6/8] vhost: annotate vDPA device list accesses
On 4/11/22 13:00, David Marchand wrote: vdpa_device_list access must be protected with vdpa_device_list_lock spinlock. Signed-off-by: David Marchand --- lib/vhost/vdpa.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [RFC PATCH v3 7/8] vhost: annotate IOTLB locks
On 4/11/22 13:00, David Marchand wrote: This change simply annotates existing paths of the code leading to manipulations of the IOTLB r/w locks. clang does not support conditionally held locks, so always take iotlb locks regardless of VIRTIO_F_IOMMU_PLATFORM feature. vdpa and vhost_crypto code are annotated though they end up not taking a IOTLB lock and have been marked with a FIXME. Signed-off-by: David Marchand --- lib/vhost/iotlb.h| 8 +++ lib/vhost/vdpa.c | 1 + lib/vhost/vhost.c| 11 + lib/vhost/vhost.h| 22 +- lib/vhost/vhost_crypto.c | 7 ++ lib/vhost/virtio_net.c | 49 ++-- 6 files changed, 75 insertions(+), 23 deletions(-) I agree with the change. I don't expect performance impact of taking the lock unconditionally, because there won't be cache line sharing since it is per-vq lock and the locking cost will be offset by removing the feature check. Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [RFC PATCH v3 8/8] vhost: enable lock check
On 4/11/22 13:00, David Marchand wrote: Now that all locks in this library are annotated, we can enable the check. Signed-off-by: David Marchand --- lib/vhost/meson.build | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/vhost/meson.build b/lib/vhost/meson.build index bc7272053b..197a51d936 100644 --- a/lib/vhost/meson.build +++ b/lib/vhost/meson.build @@ -17,6 +17,8 @@ elif (toolchain == 'icc' and cc.version().version_compare('>=16.0.0')) endif dpdk_conf.set('RTE_LIBRTE_VHOST_POSTCOPY', cc.has_header('linux/userfaultfd.h')) cflags += '-fno-strict-aliasing' + +annotate_locks = true sources = files( 'fd_man.c', 'iotlb.c', Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [PATCH 2/3] ethdev: fix memory leak when telemetry xstats
On Thu, Apr 21, 2022 at 11:04 AM Bruce Richardson wrote: > > We need some minimal testing for telemetry commands. > > > > It could be a test automatically calling all available /ethdev/ > > commands on a running testpmd. > > This test could be really simple, not even checking what is returned. > > It would just try every command sequentially with no parameter first, > > then with port 0 and finally with port 1. > > > > That seems reasonable. However, I'd go a little further and have all > available commands called as an initial sanity check. Then we can use some > heuristics to go further, with the *dev/stats commands or xstats commands > all being called with numeric parameters as you suggest. Ok, lgtm too. Just to be clear, I don't have the time to work on this, so this is open to volunteers. -- David Marchand
Re: [PATCH] examples/vdpa: fix disabled VirtQ statistics query
"examples/vdpa: fix disabled virtqueue statistics query" On 2/24/22 14:24, Xueming Li wrote: Quit VirtQ statistics query instead of reporting error. Fixes: 6505865aa8ed ("examples/vdpa: add statistics show command") Cc: sta...@dpdk.org Signed-off-by: Xueming Li --- examples/vdpa/main.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/examples/vdpa/main.c b/examples/vdpa/main.c index 5ab07655aed..bd66deca85c 100644 --- a/examples/vdpa/main.c +++ b/examples/vdpa/main.c @@ -391,7 +391,9 @@ static void cmd_device_stats_parsed(void *parsed_result, struct cmdline *cl, struct rte_vdpa_device *vdev = rte_vdpa_find_device_by_name(res->bdf); struct vdpa_port *vport = NULL; uint32_t first, last; + int vq_disabled = -1; int i; + int ret; if (!vdev) { RTE_LOG(ERR, VDPA, "Invalid device: %s.\n", @@ -449,8 +451,20 @@ static void cmd_device_stats_parsed(void *parsed_result, struct cmdline *cl, cmdline_printf(cl, "\nDevice %s:\n", res->bdf); for (; first <= last; first++) { memset(vport->stats, 0, sizeof(*vport->stats) * vport->stats_n); - if (rte_vdpa_get_stats(vport->dev, (int)first, vport->stats, - vport->stats_n) <= 0) { + ret = rte_vdpa_get_stats(vport->dev, (int)first, vport->stats, + vport->stats_n); + if (ret == 0) { + /* VQ disabled. */ + if (vq_disabled == -1) + vq_disabled = (int)first; + continue; + } + if (vq_disabled != -1) { + cmdline_printf(cl, "\tVirtq %d - %d disabled\n", + vq_disabled, (int)first - 1); + vq_disabled = -1; + } + if (ret < 0) { RTE_LOG(ERR, VDPA, "Failed to get vdpa queue statistics" " for device %s qid %d.\n", res->bdf, (int)first); @@ -464,6 +478,9 @@ static void cmd_device_stats_parsed(void *parsed_result, struct cmdline *cl, vport->stats[i].value); } } + if (vq_disabled != -1) + cmdline_printf(cl, "\tVirtq %d - %d disabled\n", + vq_disabled, (int)first - 1); } cmdline_parse_token_string_t cmd_device_stats_ = It is not clear to me how it is going to look like, could you paste some logs?
Re: [PATCH v1 18/19] timer: remove unneeded header includes
On 21/04/2022 21:08, Stephen Hemminger wrote: On Thu, 21 Apr 2022 19:08:58 + Sean Morrissey wrote: diff --git a/lib/timer/rte_timer.c b/lib/timer/rte_timer.c index c51a393e5c..f52ccc33ed 100644 --- a/lib/timer/rte_timer.c +++ b/lib/timer/rte_timer.c @@ -5,12 +5,9 @@ #include #include #include -#include #include -#include #include -#include #include #include #include This doesn't look right. rte_timer.c relies on rte_get_timer_cycles() which is defined in rte_cycles.h Perhaps iwyu is getting confused, or thinking that is already covered by another include file? IWYU can throw false positives. Please let me fix all build issues with this patchset and get back to you. It could re-introduce the header in question. If not I will investigate where IWYU believes the include is coming from. Typically the headers removed by this tool are usually already included from a daisy chain of includes from another header.
Re: kni: check abi version between kmod and lib
Stephen Hemminger writes: > On Thu, 21 Apr 2022 11:40:00 -0400 > Ray Kinsella wrote: > >> Stephen Hemminger writes: >> >> > On Thu, 21 Apr 2022 12:38:26 +0800 >> > Stephen Coleman wrote: >> > >> >> KNI ioctl functions copy data from userspace lib, and this interface >> >> of kmod is not compatible indeed. If the user use incompatible rte_kni.ko >> >> bad things happen: sometimes various fields contain garbage value, >> >> sometimes it cause a kmod soft lockup. >> >> >> >> Some common distros ship their own rte_kni.ko, so this is likely to >> >> happen. >> >> >> >> This patch add abi version checking between userland lib and kmod so >> >> that: >> >> >> >> * if kmod ioctl got a wrong abi magic, it refuse to go on >> >> * if userland lib, probed a wrong abi version via newly added ioctl, it >> >> also refuse to go on >> >> >> >> Bugzilla ID: 998 >> > >> > >> > Kernel API's are supposed to be 99% stable. >> > If this driver was playing by the upstream kernel rules this would not >> > have happened. >> >> Well look, it is out-of-tree and never likely to be in-tree, so those >> rules don't apply. Making sure the ABI doesn't change during the ABI >> stablity period, should be good enough? >> > > I think if KNI changes, it should just add more ioctl numbers and > be compatible, it is not that hard. True, fair point, I am unsure what that buys us though. My thinking was that we should be doing the minimal amount of work on KNI, and directing people to use upstream alternatives where possible. For me minimizing means DPDK ABI alignment. However I see your point, let KNI maintain it own ABI versioning independent of DPDK, with stricter kernel-like guarantees is probably not much more work. -- Regards, Ray K
[Bug 999] memory access overflow in skeleton_rawdev
https://bugs.dpdk.org/show_bug.cgi?id=999 Bug ID: 999 Summary: memory access overflow in skeleton_rawdev Product: DPDK Version: 21.11 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: core Assignee: dev@dpdk.org Reporter: yonghaoz1...@gmail.com Target Milestone: --- Hi all, In function "skeleton_rawdev_enqueue_bugs", the variable "q_id" is "uint16_t", but we convert the variable "context" to (int*), which may cause memory access overflow. See the following ASan report: ==3042499==ERROR: AddressSanitizer: stack-buffer-overflow on address 0xdd8d6700 at pc 0x10c57c80 bp 0xdd8d6600 sp 0xdd8d65f8 READ of size 4 at 0xdd8d6700 thread T0 /usr/local/bin/llvm-symbolizer: /usr/lib64/libtinfo.so.5: no version information available (required by /usr/local/bin/llvm-symbolizer) #0 0x10c57c7c in skeleton_rawdev_enqueue_bufs /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../drivers/raw/skeleton/skeleton_rawdev.c:424:9 #1 0x1d74dbc in rte_rawdev_enqueue_buffers /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../lib/rawdev/rte_rawdev.c:233:9 #2 0x10c5fb38 in test_rawdev_enqdeq /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../drivers/raw/skeleton/skeleton_rawdev_test.c:382:8 #3 0x10c5ac30 in skeldev_test_run /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../drivers/raw/skeleton/skeleton_rawdev_test.c:425:9 #4 0x10c5a3bc in test_rawdev_skeldev /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../drivers/raw/skeleton/skeleton_rawdev_test.c:460:2 #5 0x1d77668 in rte_rawdev_selftest /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../lib/rawdev/rte_rawdev.c:388:9 #6 0xa3ccc8 in test_rawdev_selftest_impl /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../app/test/test_rawdev.c:21:8 #7 0xa3cb08 in test_rawdev_selftest_skeleton /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../app/test/test_rawdev.c:29:9 #8 0xa3c7f4 in test_rawdev_selftests /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../app/test/test_rawdev.c:40:6 #9 0x4c6ec8 in cmd_autotest_parsed /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../app/test/commands.c:70:10 #10 0x207ef14 in cmdline_parse /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../lib/cmdline/cmdline_parse.c:290:3 #11 0x2074fbc in cmdline_valid_buffer /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../lib/cmdline/cmdline.c:26:8 #12 0x208fef4 in rdline_char_in /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../lib/cmdline/cmdline_rdline.c:446:5 #13 0x2075d50 in cmdline_in /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../lib/cmdline/cmdline.c:148:
[Bug 1000] memory access overflow in skeleton_rawdev
https://bugs.dpdk.org/show_bug.cgi?id=1000 Bug ID: 1000 Summary: memory access overflow in skeleton_rawdev Product: DPDK Version: 21.11 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: core Assignee: dev@dpdk.org Reporter: yonghaoz1...@gmail.com Target Milestone: --- Hi all, In function "skeleton_rawdev_enqueue_bugs", the variable "q_id" is "uint16_t", but we convert the variable "context" to (int*), which may cause memory access overflow. See the following ASan report: ==3042499==ERROR: AddressSanitizer: stack-buffer-overflow on address 0xdd8d6700 at pc 0x10c57c80 bp 0xdd8d6600 sp 0xdd8d65f8 READ of size 4 at 0xdd8d6700 thread T0 /usr/local/bin/llvm-symbolizer: /usr/lib64/libtinfo.so.5: no version information available (required by /usr/local/bin/llvm-symbolizer) #0 0x10c57c7c in skeleton_rawdev_enqueue_bufs /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../drivers/raw/skeleton/skeleton_rawdev.c:424:9 #1 0x1d74dbc in rte_rawdev_enqueue_buffers /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../lib/rawdev/rte_rawdev.c:233:9 #2 0x10c5fb38 in test_rawdev_enqdeq /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../drivers/raw/skeleton/skeleton_rawdev_test.c:382:8 #3 0x10c5ac30 in skeldev_test_run /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../drivers/raw/skeleton/skeleton_rawdev_test.c:425:9 #4 0x10c5a3bc in test_rawdev_skeldev /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../drivers/raw/skeleton/skeleton_rawdev_test.c:460:2 #5 0x1d77668 in rte_rawdev_selftest /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../lib/rawdev/rte_rawdev.c:388:9 #6 0xa3ccc8 in test_rawdev_selftest_impl /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../app/test/test_rawdev.c:21:8 #7 0xa3cb08 in test_rawdev_selftest_skeleton /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../app/test/test_rawdev.c:29:9 #8 0xa3c7f4 in test_rawdev_selftests /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../app/test/test_rawdev.c:40:6 #9 0x4c6ec8 in cmd_autotest_parsed /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../app/test/commands.c:70:10 #10 0x207ef14 in cmdline_parse /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../lib/cmdline/cmdline_parse.c:290:3 #11 0x2074fbc in cmdline_valid_buffer /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../lib/cmdline/cmdline.c:26:8 #12 0x208fef4 in rdline_char_in /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../lib/cmdline/cmdline_rdline.c:446:5 #13 0x2075d50 in cmdline_in /home/baijiaju/test_dpdk/dpdk-21.11-EH/build/../lib/cmdline/cmdline.c:14
Re: [RFC] eal: add bus cleanup to eal cleanup
On 20/04/2022 07:55, Morten Brørup wrote: From: Kevin Laatz [mailto:kevin.la...@intel.com] Sent: Tuesday, 19 April 2022 18.15 During EAL init, all buses are probed and the devices found are initialized. On eal_cleanup(), the inverse does not happen, meaning any allocated memory and other configuration will not be cleaned up appropriately on exit. Currently, in order for device cleanup to take place, applications must call the driver-relevant functions to ensure proper cleanup is done before the application exits. Since initialization occurs for all devices on the bus, not just the devices used by an application, it requires a) application awareness of all bus devices that could have been probed on the system, and b) code duplication across applications to ensure cleanup is performed. An example of this is rte_eth_dev_close() which is commonly used across the example applications. This RFC proposes adding bus cleanup to the eal_cleanup() to make EAL's init/exit more symmetrical, ensuring all bus devices are cleaned up appropriately without the application needing to be aware of all bus types that may have been probed during initialization. Contained in this RFC are the changes required to perform cleanup for devices on the PCI bus during eal_cleanup(). This can be expanded in subsequent versions if these changes are desired. There would be an ask for bus maintainers to add the relevant cleanup for their buses since they have the domain expertise. Signed-off-by: Kevin Laatz --- [...] + RTE_LOG(INFO, EAL, + "Clean up PCI driver: %s (%x:%x) device: "PCI_PRI_FMT" (socket %i)\n", + drv->driver.name, dev->id.vendor_id, dev- id.device_id, + loc->domain, loc->bus, loc->devid, loc- function, + dev->device.numa_node); I agree with Stephen, this message might as well be DEBUG level. You could argue for symmetry: If the "alloc" message during startup is INFO level, it makes sense using INFO level for the "free" message during cleanup too. However, the message probably has far lower information value during cleanup (because this driver cleanup is expected to happen), so I would degrade it to DEBUG level. Symmetry is not always the strongest argument. I have no strong preference, so I'll leave it up to you, Kevin. Thanks for the feedback. +1, will change to debug for v2. [...] @@ -263,6 +275,7 @@ struct rte_bus { const char *name;/**< Name of the bus */ rte_bus_scan_t scan; /**< Scan for devices attached to bus */ rte_bus_probe_t probe; /**< Probe devices on bus */ + rte_bus_cleanup_t cleanup; /**< Cleanup devices on bus */ rte_bus_find_device_t find_device; /**< Find a device on the bus */ rte_bus_plug_t plug; /**< Probe single device for drivers */ rte_bus_unplug_t unplug; /**< Remove single device from driver */ Have you considered if modifying the rte_bus structure in /lib/eal/include/rte_bus.h breaks the ABI or not? I've looked into this and have run test-meson-builds with ABI checks enabled. The output of those checks flagged some potential breaks, however I believe these are false positives. The output indicated 2 potential breaks (in multiple places, but the root is the same) 1. Member has been added to the rte_bus struct. This is flagged as a sub-type change, however since rte_bus is only ever reference by pointer, it is not a break. 2. Offset of members changes in 'rte_pci_bus' and 'rte_vmbus_bus' structs. These structs are only used internally so also do no break ABI. Since the ABI checks do flag the addition, I will add an entry to the abignore for the v2. Overall, this patch is certainly a good idea! On the condition that modifying the rte_bus structure does not break the ABI... Acked-by: Morten Brørup
RE: [PATCH] net/virtio: unmap PCI device in secondary process
> -Original Message- > From: Yuan Wang > Sent: Thursday, April 21, 2022 7:16 PM > To: maxime.coque...@redhat.com; Xia, Chenbo > Cc: dev@dpdk.org; Hu, Jiayu ; He, Xingguang > ; Wang, YuanX > Subject: [PATCH] net/virtio: unmap PCI device in secondary process > Tested-by: Wei Ling
[PATCH v2 0/2] crypto/qat: add secp384r1 curve support
This patchset adds secp384r1 (P-384) elliptic curve to Intel QuickAssist Technology crypto PMD. v2: - added release notes Arek Kusztal (2): crypto/qat: refactor asym algorithm macros and logs crypto/qat: add secp384r1 curve doc/guides/rel_notes/release_22_07.rst | 4 + drivers/common/qat/qat_adf/qat_pke.h | 12 ++ drivers/crypto/qat/qat_asym.c | 230 ++--- drivers/crypto/qat/qat_asym.h | 3 +- drivers/crypto/qat/qat_ec.h| 77 ++- 5 files changed, 193 insertions(+), 133 deletions(-) -- 2.13.6
[PATCH v2 1/2] crypto/qat: refactor asym algorithm macros and logs
This commit unifies macros for asymmetric parameters, therefore making code easier to maintain. It additionally changes some of PMD output logs that right now can only be seen in debug mode. Signed-off-by: Arek Kusztal --- drivers/crypto/qat/qat_asym.c | 230 ++ drivers/crypto/qat/qat_asym.h | 3 +- drivers/crypto/qat/qat_ec.h | 1 - 3 files changed, 101 insertions(+), 133 deletions(-) diff --git a/drivers/crypto/qat/qat_asym.c b/drivers/crypto/qat/qat_asym.c index 479d5308cf..d2041b2efa 100644 --- a/drivers/crypto/qat/qat_asym.c +++ b/drivers/crypto/qat/qat_asym.c @@ -34,7 +34,7 @@ static const struct rte_driver cryptodev_qat_asym_driver = { /* * Macros with suffix _F are used with some of predefinded identifiers: * - cookie->input_buffer - * - qat_alg_bytesize + * - qat_func_alignsize */ #if RTE_LOG_DP_LEVEL >= RTE_LOG_DEBUG #define HEXDUMP(name, where, size) QAT_DP_HEXDUMP_LOG(DEBUG, name, \ @@ -43,8 +43,8 @@ static const struct rte_driver cryptodev_qat_asym_driver = { &where[idx * size], size) #define HEXDUMP_OFF_F(name, idx) QAT_DP_HEXDUMP_LOG(DEBUG, name, \ - &cookie->input_buffer[idx * qat_alg_bytesize], \ - qat_alg_bytesize) + &cookie->input_buffer[idx * qat_func_alignsize], \ + qat_func_alignsize) #else #define HEXDUMP(name, where, size) #define HEXDUMP_OFF(name, where, size, idx) @@ -69,36 +69,28 @@ static const struct rte_driver cryptodev_qat_asym_driver = { } \ } while (0) -#define SET_PKE_LN(where, what, how, idx) \ - rte_memcpy(where[idx] + how - \ - what.length, \ - what.data, \ - what.length) - -#define SET_PKE_LN_9A(where, what, how, idx) \ - rte_memcpy(&where[idx * RTE_ALIGN_CEIL(how, 8)] + \ - RTE_ALIGN_CEIL(how, 8) - \ +#define SET_PKE_LN(what, how, idx) \ + rte_memcpy(cookie->input_array[idx] + how - \ what.length, \ what.data, \ what.length) -#define SET_PKE_LN_EC(where, what, how, idx) \ - rte_memcpy(where[idx] + \ - RTE_ALIGN_CEIL(how, 8) - \ - how, \ - what.data, \ - how) +#define SET_PKE_LN_EC(curve, p, idx) \ + rte_memcpy(cookie->input_array[idx] + \ + qat_func_alignsize - curve.bytesize, \ + curve.p.data, curve.bytesize) -#define SET_PKE_LN_9A_F(what, idx) \ - rte_memcpy(&cookie->input_buffer[idx * qat_alg_bytesize] + \ - qat_alg_bytesize - what.length, \ +#define SET_PKE_9A_IN(what, idx) \ + rte_memcpy(&cookie->input_buffer[idx * \ + qat_func_alignsize] + \ + qat_func_alignsize - what.length, \ what.data, what.length) -#define SET_PKE_LN_EC_F(what, how, idx) \ +#define SET_PKE_9A_EC(curve, p, idx) \ rte_memcpy(&cookie->input_buffer[idx * \ - RTE_ALIGN_CEIL(how, 8)] + \ - RTE_ALIGN_CEIL(how, 8) - how, \ - what.data, how) + qat_func_alignsize] + \ + qat_func_alignsize - curve.bytesize, \ + curve.p.data, curve.bytesize) static void request_init(struct icp_qat_fw_pke_request *qat_req) @@ -231,12 +223,9 @@ modexp_set_input(struct rte_crypto_asym_op *asym_op, } alg_bytesize = qat_function.bytesize; - SET_PKE_LN(cookie->input_array, asym_op->modex.base, - alg_bytesize, 0); - SET_PKE_LN(cookie->input_array, xform->modex.exponent, - alg_bytesize, 1); - SET_PKE_LN(cookie->input_array, xform->modex.modulus, - alg_bytesize, 2); + SET_PKE_LN(asym_op->modex.base, alg_bytesize, 0); + SET_PKE_LN(xform->modex.exponent, alg_bytesize, 1); + SET_PKE_LN(xform->modex.modulus, alg_bytesize, 2); cookie->alg_bytesize = alg_bytesize; qat_req->pke_hdr.cd_pars.func_id = func_id; @@ -290,10 +279,8 @@ modinv_set_input(struct rte_crypto_asym_op *asym_op, } alg_bytesize = qat_function.bytesize; - SET_PKE_LN(cookie->input_array, asym_op->modinv.base, - alg_bytesize, 0); - SET_PKE_LN(cookie->input_array, xform->modinv.modulus, - alg_bytesize, 1); + SET_PKE_LN(asym_op->modinv.base, alg_bytesize, 0); + SET_PKE_LN(xform->modinv.modulus, alg_bytesize, 1); cookie->alg_bytesize = alg_bytesize; qat_req->pke_hdr.cd_pars.func_id = func_id; @@ -347,8 +334,7 @@ rsa_set_pub_input(struct rte_crypto_asym_op *asym_op, if (asym_op->rsa.op_type == RTE_CRYPTO_ASYM_OP_ENCRYPT) {
[PATCH v2 2/2] crypto/qat: add secp384r1 curve
This commit adds secp384r1 (P-384) elliptic curve to Intel QuickAssist Technology crypto PMD. Signed-off-by: Arek Kusztal --- doc/guides/rel_notes/release_22_07.rst | 4 ++ drivers/common/qat/qat_adf/qat_pke.h | 12 ++ drivers/crypto/qat/qat_ec.h| 76 ++ 3 files changed, 92 insertions(+) diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst index 42a5f2d990..7f44d363b5 100644 --- a/doc/guides/rel_notes/release_22_07.rst +++ b/doc/guides/rel_notes/release_22_07.rst @@ -55,6 +55,10 @@ New Features Also, make sure to start the actual text at the margin. === +* **Updated Intel QuickAssist Technology (QAT) crypto PMD.** + + * Added support for secp384r1 elliptic curve. + Removed Items - diff --git a/drivers/common/qat/qat_adf/qat_pke.h b/drivers/common/qat/qat_adf/qat_pke.h index b5fb2a020c..6c12bfd989 100644 --- a/drivers/common/qat/qat_adf/qat_pke.h +++ b/drivers/common/qat/qat_adf/qat_pke.h @@ -228,6 +228,10 @@ get_ecdsa_verify_function(struct rte_crypto_asym_xform *xform) qat_function.func_id = PKE_ECDSA_VERIFY_GFP_L256; qat_function.bytesize = 32; break; + case RTE_CRYPTO_EC_GROUP_SECP384R1: + qat_function.func_id = PKE_ECDSA_VERIFY_GFP_L512; + qat_function.bytesize = 64; + break; case RTE_CRYPTO_EC_GROUP_SECP521R1: qat_function.func_id = PKE_ECDSA_VERIFY_GFP_521; qat_function.bytesize = 66; @@ -248,6 +252,10 @@ get_ecdsa_function(struct rte_crypto_asym_xform *xform) qat_function.func_id = PKE_ECDSA_SIGN_RS_GFP_L256; qat_function.bytesize = 32; break; + case RTE_CRYPTO_EC_GROUP_SECP384R1: + qat_function.func_id = PKE_ECDSA_SIGN_RS_GFP_L512; + qat_function.bytesize = 64; + break; case RTE_CRYPTO_EC_GROUP_SECP521R1: qat_function.func_id = PKE_ECDSA_SIGN_RS_GFP_521; qat_function.bytesize = 66; @@ -268,6 +276,10 @@ get_ecpm_function(struct rte_crypto_asym_xform *xform) qat_function.func_id = MATHS_POINT_MULTIPLICATION_GFP_L256; qat_function.bytesize = 32; break; + case RTE_CRYPTO_EC_GROUP_SECP384R1: + qat_function.func_id = MATHS_POINT_MULTIPLICATION_GFP_L512; + qat_function.bytesize = 64; + break; case RTE_CRYPTO_EC_GROUP_SECP521R1: qat_function.func_id = MATHS_POINT_MULTIPLICATION_GFP_521; qat_function.bytesize = 66; diff --git a/drivers/crypto/qat/qat_ec.h b/drivers/crypto/qat/qat_ec.h index 1bcd7d1408..bbd0b31949 100644 --- a/drivers/crypto/qat/qat_ec.h +++ b/drivers/crypto/qat/qat_ec.h @@ -92,6 +92,80 @@ static struct elliptic_curve curve[] = { }, }, }, + [SECP384R1] = { + .name = "secp384r1", + .bytesize = 48, + .x = { + .data = { + 0xAA, 0x87, 0xCA, 0x22, 0xBE, 0x8B, 0x05, 0x37, + 0x8E, 0xB1, 0xC7, 0x1E, 0xF3, 0x20, 0xAD, 0x74, + 0x6E, 0x1D, 0x3B, 0x62, 0x8B, 0xA7, 0x9B, 0x98, + 0x59, 0xF7, 0x41, 0xE0, 0x82, 0x54, 0x2A, 0x38, + 0x55, 0x02, 0xF2, 0x5D, 0xBF, 0x55, 0x29, 0x6C, + 0x3A, 0x54, 0x5E, 0x38, 0x72, 0x76, 0x0A, 0xB7 + }, + }, + .y = { + .data = { + 0x36, 0x17, 0xDE, 0x4A, 0x96, 0x26, 0x2C, 0x6F, + 0x5D, 0x9E, 0x98, 0xBF, 0x92, 0x92, 0xDC, 0x29, + 0xF8, 0xF4, 0x1D, 0xBD, 0x28, 0x9A, 0x14, 0x7C, + 0xE9, 0xDA, 0x31, 0x13, 0xB5, 0xF0, 0xB8, 0xC0, + 0x0A, 0x60, 0xB1, 0xCE, 0x1D, 0x7E, 0x81, 0x9D, + 0x7A, 0x43, 0x1D, 0x7C, 0x90, 0xEA, 0x0E, 0x5F + }, + }, + .n = { + .data = { + 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, + 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, + 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, + 0xC7, 0x63, 0x4D, 0x81, 0xF4, 0x37, 0x2D, 0xDF, + 0x58, 0x1A, 0x0D, 0xB2, 0x48, 0xB0, 0xA7, 0x7A, + 0xEC, 0xEC, 0x19, 0x6A, 0xCC, 0xC5, 0x29, 0x73, + }, + }, + .p = { + .data = { + 0xFF,
[PATCH] net/vhost: fix TSO feature default disablement
By default, TSO feature should be disabled because it requires application's support to be functionnal as mentionned in the documentation. However, if "tso" devarg was not specified, the feature did not get disabled. This patch fixes this issue, so that TSO is disabled, even if "tso=0" is not passed as devarg. Fixes: e289400669d5 ("net/vhost: support TSO disabling") Cc: sta...@dpdk.org Signed-off-by: Maxime Coquelin --- drivers/net/vhost/rte_eth_vhost.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/vhost/rte_eth_vhost.c b/drivers/net/vhost/rte_eth_vhost.c index 070f0e6dfd..19c80044c8 100644 --- a/drivers/net/vhost/rte_eth_vhost.c +++ b/drivers/net/vhost/rte_eth_vhost.c @@ -1643,11 +1643,11 @@ rte_pmd_vhost_probe(struct rte_vdev_device *dev) &open_int, &tso); if (ret < 0) goto out_free; + } - if (tso == 0) { - disable_flags |= (1ULL << VIRTIO_NET_F_HOST_TSO4); - disable_flags |= (1ULL << VIRTIO_NET_F_HOST_TSO6); - } + if (tso == 0) { + disable_flags |= (1ULL << VIRTIO_NET_F_HOST_TSO4); + disable_flags |= (1ULL << VIRTIO_NET_F_HOST_TSO6); } if (rte_kvargs_count(kvlist, ETH_VHOST_LINEAR_BUF) == 1) { -- 2.35.1
Re: [RFC PATCH v3 2/8] vhost: annotate virtqueue access lock
On Thu, Apr 21, 2022 at 5:25 PM Maxime Coquelin wrote: > On 4/11/22 13:00, David Marchand wrote: > > This change simply annotates existing paths of the code leading to > > manipulations of the vq->access_lock. > > > > One small change is required: vhost_poll_enqueue_completed was getting > > a queue_id to get hold of the vq, while its callers already knew of > > the vq. For the annotation sake, vq is now directly passed. > > It is anyway more consistent with the rest of the code to pass directly > the vq in internal API when queue ID is not needed. > > > vhost_user_lock_all_queue_pairs and vhost_user_unlock_all_queue_pairs > > are skipped since vq->access_lock are conditionally held. > > As discussed off-list, I wonder whether it could be possible to rework > the conditional lock holding using the static array and some macros so > that we could statically specify for each request if the lock is > required. We did discuss some ideas off-list, but in the end, since we have multiple locks being dynamically taken in vhost_user_lock_all_queue_pairs, I see no way to statically annotate the code. We could rework the code to have message handlers in a consolidated static array, but that would not help with annotations. I had some patches going in that direction (related to some fd fixes I sent before), but it needs more work. I'll see if I can send this later in the release or it will go to next release. -- David Marchand
[PATCH v6] examples/l3fwd: merge l3fwd-acl into l3fwd
l3fwd-acl contains duplicate functions to l3fwd. For this reason we merge l3fwd-acl code into l3fwd with '--lookup acl' cmdline option to run ACL. Signed-off-by: Sean Morrissey Acked-by: Konstantin Ananyev --- V6: * fix ipv6 rule parsing V5: * remove undefined functions * remove unused struct members V4: * update maintainers * fix doc changes V3: * remove unnecessary declarations * move functions to correct files V2: * add doc changes * minor code cleanup --- MAINTAINERS |2 - doc/guides/rel_notes/release_22_07.rst|5 + doc/guides/sample_app_ug/index.rst|1 - doc/guides/sample_app_ug/l3_forward.rst | 63 +- .../sample_app_ug/l3_forward_access_ctrl.rst | 340 --- examples/l3fwd-acl/Makefile | 51 - examples/l3fwd-acl/main.c | 2272 - examples/l3fwd-acl/meson.build| 13 - examples/l3fwd/Makefile |2 +- examples/l3fwd/l3fwd.h| 31 +- examples/l3fwd/l3fwd_acl.c| 1112 examples/l3fwd/l3fwd_acl.h| 57 + examples/l3fwd/l3fwd_acl_scalar.h | 112 + examples/l3fwd/l3fwd_route.h | 16 + examples/l3fwd/main.c | 65 +- examples/l3fwd/meson.build|3 +- examples/meson.build |1 - 17 files changed, 1446 insertions(+), 2700 deletions(-) delete mode 100644 doc/guides/sample_app_ug/l3_forward_access_ctrl.rst delete mode 100644 examples/l3fwd-acl/Makefile delete mode 100644 examples/l3fwd-acl/main.c delete mode 100644 examples/l3fwd-acl/meson.build create mode 100644 examples/l3fwd/l3fwd_acl.c create mode 100644 examples/l3fwd/l3fwd_acl.h create mode 100644 examples/l3fwd/l3fwd_acl_scalar.h diff --git a/MAINTAINERS b/MAINTAINERS index 7c4f541dba..dc4fa86c8d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1505,8 +1505,6 @@ F: lib/acl/ F: doc/guides/prog_guide/packet_classif_access_ctrl.rst F: app/test-acl/ F: app/test/test_acl.* -F: examples/l3fwd-acl/ -F: doc/guides/sample_app_ug/l3_forward_access_ctrl.rst EFD M: Byron Marohn diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst index 42a5f2d990..204dd028da 100644 --- a/doc/guides/rel_notes/release_22_07.rst +++ b/doc/guides/rel_notes/release_22_07.rst @@ -55,6 +55,11 @@ New Features Also, make sure to start the actual text at the margin. === +* **Merged l3fwd-acl into l3fwd.** + + Merged l3fwd-acl code into l3fwd as l3fwd-acl contains duplicate and + common functions to l3fwd. + Removed Items - diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst index 853e338778..cc9fae1e8c 100644 --- a/doc/guides/sample_app_ug/index.rst +++ b/doc/guides/sample_app_ug/index.rst @@ -31,7 +31,6 @@ Sample Applications User Guides l3_forward l3_forward_graph l3_forward_power_man -l3_forward_access_ctrl link_status_intr server_node_efd service_cores diff --git a/doc/guides/sample_app_ug/l3_forward.rst b/doc/guides/sample_app_ug/l3_forward.rst index 01d86db95d..e82168710d 100644 --- a/doc/guides/sample_app_ug/l3_forward.rst +++ b/doc/guides/sample_app_ug/l3_forward.rst @@ -11,7 +11,7 @@ The application performs L3 forwarding. Overview -The application demonstrates the use of the hash, LPM and FIB libraries in DPDK +The application demonstrates the use of the hash, LPM, FIB and ACL libraries in DPDK to implement packet forwarding using poll or event mode PMDs for packet I/O. The initialization and run-time paths are very similar to those of the :doc:`l2_forward_real_virtual` and :doc:`l2_forward_event`. @@ -22,7 +22,7 @@ decision is made based on information read from the input packet. Eventdev can optionally use S/W or H/W (if supported by platform) scheduler implementation for packet I/O based on run time parameters. -The lookup method is hash-based, LPM-based or FIB-based +The lookup method is hash-based, LPM-based, FIB-based or ACL-based and is selected at run time. When the selected lookup method is hash-based, a hash object is used to emulate the flow classification stage. @@ -44,7 +44,15 @@ returned by the LPM or FIB lookup. The set of LPM and FIB rules used by the application is statically configured and loaded into the LPM or FIB object at initialization time. -In the sample application, hash-based and FIB-based forwarding supports +For ACL the ACL library is used to perform both ACL and route entry lookup. +When packets are received from a port, +the application extracts the necessary information from the TCP/IP header of the received packet and +performs a lookup in the rule database to figure out whether the packets should be dropped (in the ACL range) +or forwarded to desired ports. +For AC
[PATCH v2] crypto/qat: add diffie hellman algorithm
This commits adds Diffie-Hellman key exchange algorithm to Intel QuickAssist Technology PMD. Signed-off-by: Arek Kusztal --- Depends-on: series-22621 ("crypto/qat: add secp384r1 curve support") v2: - updated release notes - updated qat documentation doc/guides/cryptodevs/qat.rst | 1 + doc/guides/rel_notes/release_22_07.rst | 1 + drivers/common/qat/qat_adf/qat_pke.h | 36 +++ drivers/crypto/qat/qat_asym.c | 168 + 4 files changed, 206 insertions(+) diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst index 785e041324..37fd554ca1 100644 --- a/doc/guides/cryptodevs/qat.rst +++ b/doc/guides/cryptodevs/qat.rst @@ -177,6 +177,7 @@ The QAT ASYM PMD has support for: * ``RTE_CRYPTO_ASYM_XFORM_RSA`` * ``RTE_CRYPTO_ASYM_XFORM_ECDSA`` * ``RTE_CRYPTO_ASYM_XFORM_ECPM`` +* ``RTE_CRYPTO_ASYM_XFORM_DH`` Limitations ~~~ diff --git a/doc/guides/rel_notes/release_22_07.rst b/doc/guides/rel_notes/release_22_07.rst index 7f44d363b5..ac701645b1 100644 --- a/doc/guides/rel_notes/release_22_07.rst +++ b/doc/guides/rel_notes/release_22_07.rst @@ -58,6 +58,7 @@ New Features * **Updated Intel QuickAssist Technology (QAT) crypto PMD.** * Added support for secp384r1 elliptic curve. + * Added support for Diffie-Hellman (FFDH) algorithm . Removed Items diff --git a/drivers/common/qat/qat_adf/qat_pke.h b/drivers/common/qat/qat_adf/qat_pke.h index 6c12bfd989..c727e4e1af 100644 --- a/drivers/common/qat/qat_adf/qat_pke.h +++ b/drivers/common/qat/qat_adf/qat_pke.h @@ -137,6 +137,42 @@ get_modinv_function(struct rte_crypto_asym_xform *xform) } static struct qat_asym_function +get_dh_g2_function(uint32_t bytesize) +{ + struct qat_asym_function qat_function = { }; + + if (bytesize <= 256) { + qat_function.func_id = PKE_DH_G2_2048; + qat_function.bytesize = 256; + } else if (bytesize <= 384) { + qat_function.func_id = PKE_DH_G2_3072; + qat_function.bytesize = 384; + } else if (bytesize <= 512) { + qat_function.func_id = PKE_DH_G2_4096; + qat_function.bytesize = 512; + } + return qat_function; +} + +static struct qat_asym_function +get_dh_function(uint32_t bytesize) +{ + struct qat_asym_function qat_function = { }; + + if (bytesize <= 256) { + qat_function.func_id = PKE_DH_2048; + qat_function.bytesize = 256; + } else if (bytesize <= 384) { + qat_function.func_id = PKE_DH_3072; + qat_function.bytesize = 384; + } else if (bytesize <= 512) { + qat_function.func_id = PKE_DH_4096; + qat_function.bytesize = 512; + } + return qat_function; +} + +static struct qat_asym_function get_rsa_enc_function(struct rte_crypto_asym_xform *xform) { struct qat_asym_function qat_function = { }; diff --git a/drivers/crypto/qat/qat_asym.c b/drivers/crypto/qat/qat_asym.c index d2041b2efa..c2a985b355 100644 --- a/drivers/crypto/qat/qat_asym.c +++ b/drivers/crypto/qat/qat_asym.c @@ -748,6 +748,125 @@ ecpm_collect(struct rte_crypto_asym_op *asym_op, } static int +dh_mod_g2_input(struct rte_crypto_asym_op *asym_op, + struct icp_qat_fw_pke_request *qat_req, + struct qat_asym_op_cookie *cookie, + struct rte_crypto_asym_xform *xform) +{ + struct qat_asym_function qat_function; + uint32_t alg_bytesize, func_id; + + qat_function = get_dh_g2_function(xform->dh.p.length); + func_id = qat_function.func_id; + if (qat_function.func_id == 0) { + QAT_LOG(ERR, "Cannot obtain functionality id"); + return -EINVAL; + } + alg_bytesize = qat_function.bytesize; + SET_PKE_LN(asym_op->dh.priv_key, alg_bytesize, 0); + SET_PKE_LN(xform->dh.p, alg_bytesize, 1); + cookie->alg_bytesize = alg_bytesize; + cookie->qat_func_alignsize = alg_bytesize; + + qat_req->pke_hdr.cd_pars.func_id = func_id; + qat_req->input_param_count = 2; + qat_req->output_param_count = 1; + + HEXDUMP("DH Priv", cookie->input_array[0], alg_bytesize); + HEXDUMP("DH p", cookie->input_array[1], alg_bytesize); + + return 0; +} + +static int +dh_mod_n_input(struct rte_crypto_asym_op *asym_op, + struct icp_qat_fw_pke_request *qat_req, + struct qat_asym_op_cookie *cookie, + struct rte_crypto_asym_xform *xform) +{ + struct qat_asym_function qat_function; + uint32_t alg_bytesize, func_id; + + qat_function = get_dh_function(xform->dh.p.length); + func_id = qat_function.func_id; + if (qat_function.func_id == 0) { + QAT_LOG(ERR, "Cannot obtain functionality id"); + return -EINVAL; + } + alg_bytesize = qat_function.bytesize; + if (xform->dh.type == RTE_CRYPTO_ASYM_OP_PUBLIC_KEY_GENERATE)
Re: kni: check abi version between kmod and lib
thanks for your replies I'm aware that kernel guidelines propose ascending ioctl numbers to max out compatibility, but this will not work with dpdk, especially our case here. If you look into kni_net.c you'll see the module is actually internally depending on the memory layout of mbuf and a few other structs, you will need to change ioctl numbers if those change, and that's very implicit and requires extra effort. Plus the compatibility is almost impossible to maintain across dpdk releases, as the module won't know which version of mbuf layout it is working with. In short, rte_kni.ko is part of dpdk rather than part of kernel, and different parts of different dpdk releases do not work together -- so we reject them early in the first before it make a disaster. p.s. working on v3 to fix code format issues p.p.s. forgot to 'reply all' last time, sorry for the duplication > > > Stephen Hemminger writes: > > > On Thu, 21 Apr 2022 11:40:00 -0400 > > Ray Kinsella wrote: > > > >> Stephen Hemminger writes: > >> > >> > On Thu, 21 Apr 2022 12:38:26 +0800 > >> > Stephen Coleman wrote: > >> > > >> >> KNI ioctl functions copy data from userspace lib, and this interface > >> >> of kmod is not compatible indeed. If the user use incompatible > >> >> rte_kni.ko > >> >> bad things happen: sometimes various fields contain garbage value, > >> >> sometimes it cause a kmod soft lockup. > >> >> > >> >> Some common distros ship their own rte_kni.ko, so this is likely to > >> >> happen. > >> >> > >> >> This patch add abi version checking between userland lib and kmod so > >> >> that: > >> >> > >> >> * if kmod ioctl got a wrong abi magic, it refuse to go on > >> >> * if userland lib, probed a wrong abi version via newly added ioctl, it > >> >> also refuse to go on > >> >> > >> >> Bugzilla ID: 998 > >> > > >> > > >> > Kernel API's are supposed to be 99% stable. > >> > If this driver was playing by the upstream kernel rules this would not > >> > have happened. > >> > >> Well look, it is out-of-tree and never likely to be in-tree, so those > >> rules don't apply. Making sure the ABI doesn't change during the ABI > >> stablity period, should be good enough? > >> > > > > I think if KNI changes, it should just add more ioctl numbers and > > be compatible, it is not that hard. > > True, fair point, I am unsure what that buys us though. My thinking was > that we should be doing the minimal amount of work on KNI, and directing > people to use upstream alternatives where possible. > > For me minimizing means DPDK ABI alignment. However I see your point, > let KNI maintain it own ABI versioning independent of DPDK, with > stricter kernel-like guarantees is probably not much more work. > > -- > Regards, Ray K
[PATCH v2] gro: bug fix in identifying 0 length tcp packets
From: Kumara Parameshwaran As the minimum Ethernet frame size is 64 bytes, a 0 length tcp payload without tcp options would be 54 bytes and hence there would be padding. So it would be incorrect to use the packet length to determine the tcp data length. Fixes: 1e4cf4d6d4fb ("gro: cleanup") Cc: sta...@dpdk.org Signed-off-by: Kumara Parameshwaran --- v1: Do not use packet length to determine the tcp data length as the packet length could have padded bytes. This would lead to addition of 0 length tcp packets into the GRO layer when there ethernet fram is padded. v2: Since using ip packet length to determine the tcp data length, validate the ip packet length lib/gro/gro_tcp4.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/lib/gro/gro_tcp4.c b/lib/gro/gro_tcp4.c index 7498c66..30f5922 100644 --- a/lib/gro/gro_tcp4.c +++ b/lib/gro/gro_tcp4.c @@ -198,7 +198,8 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, struct rte_tcp_hdr *tcp_hdr; uint32_t sent_seq; int32_t tcp_dl; - uint16_t ip_id, hdr_len, frag_off; + uint16_t ip_id, frag_off; + uint16_t ip_len; uint8_t is_atomic; struct tcp4_flow_key key; @@ -217,7 +218,6 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, eth_hdr = rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *); ipv4_hdr = (struct rte_ipv4_hdr *)((char *)eth_hdr + pkt->l2_len); tcp_hdr = (struct rte_tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len); - hdr_len = pkt->l2_len + pkt->l3_len + pkt->l4_len; /* * Don't process the packet which has FIN, SYN, RST, PSH, URG, ECE @@ -229,8 +229,9 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, * Don't process the packet whose payload length is less than or * equal to 0. */ - tcp_dl = pkt->pkt_len - hdr_len; - if (tcp_dl <= 0) + ip_len = rte_be_to_cpu_16(ipv4_hdr->total_length); + tcp_dl = ip_len - (pkt->l3_len + pkt->l4_len); + if (tcp_dl <= 0 || ip_len > pkt->pkt_len) return -1; /* -- 2.7.4
[PATCH v2] gro: bug fix in identifying 0 length tcp packets
From: Kumara Parameshwaran As the minimum Ethernet frame size is 64 bytes, a 0 length tcp payload without tcp options would be 54 bytes and hence there would be padding. So it would be incorrect to use the packet length to determine the tcp data length. Fixes: 1e4cf4d6d4fb ("gro: cleanup") Cc: sta...@dpdk.org Signed-off-by: Kumara Parameshwaran --- v1: Do not use packet length to determine the tcp data length as the packet length could have padded bytes. This would lead to addition of 0 length tcp packets into the GRO layer when there ethernet fram is padded. v2: Since using ip packet length to determine the tcp data length, validate the ip packet length lib/gro/gro_tcp4.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/lib/gro/gro_tcp4.c b/lib/gro/gro_tcp4.c index 7498c66..30f5922 100644 --- a/lib/gro/gro_tcp4.c +++ b/lib/gro/gro_tcp4.c @@ -198,7 +198,8 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, struct rte_tcp_hdr *tcp_hdr; uint32_t sent_seq; int32_t tcp_dl; - uint16_t ip_id, hdr_len, frag_off; + uint16_t ip_id, frag_off; + uint16_t ip_len; uint8_t is_atomic; struct tcp4_flow_key key; @@ -217,7 +218,6 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, eth_hdr = rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *); ipv4_hdr = (struct rte_ipv4_hdr *)((char *)eth_hdr + pkt->l2_len); tcp_hdr = (struct rte_tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len); - hdr_len = pkt->l2_len + pkt->l3_len + pkt->l4_len; /* * Don't process the packet which has FIN, SYN, RST, PSH, URG, ECE @@ -229,8 +229,9 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, * Don't process the packet whose payload length is less than or * equal to 0. */ - tcp_dl = pkt->pkt_len - hdr_len; - if (tcp_dl <= 0) + ip_len = rte_be_to_cpu_16(ipv4_hdr->total_length); + tcp_dl = ip_len - (pkt->l3_len + pkt->l4_len); + if (tcp_dl <= 0 || ip_len > pkt->pkt_len) return -1; /* -- 2.7.4
[PATCH v2] gro: bug fix in identifying 0 length tcp packets
From: Kumara Parameshwaran As the minimum Ethernet frame size is 64 bytes, a 0 length tcp payload without tcp options would be 54 bytes and hence there would be padding. So it would be incorrect to use the packet length to determine the tcp data length. Fixes: 1e4cf4d6d4fb ("gro: cleanup") Cc: sta...@dpdk.org Signed-off-by: Kumara Parameshwaran --- v1: Do not use packet length to determine the tcp data length as the packet length could have padded bytes. This would lead to addition of 0 length tcp packets into the GRO layer when there ethernet fram is padded. v2: Since using ip packet length to determine the tcp data length, validate the ip packet length lib/gro/gro_tcp4.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/lib/gro/gro_tcp4.c b/lib/gro/gro_tcp4.c index 7498c66..30f5922 100644 --- a/lib/gro/gro_tcp4.c +++ b/lib/gro/gro_tcp4.c @@ -198,7 +198,8 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, struct rte_tcp_hdr *tcp_hdr; uint32_t sent_seq; int32_t tcp_dl; - uint16_t ip_id, hdr_len, frag_off; + uint16_t ip_id, frag_off; + uint16_t ip_len; uint8_t is_atomic; struct tcp4_flow_key key; @@ -217,7 +218,6 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, eth_hdr = rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *); ipv4_hdr = (struct rte_ipv4_hdr *)((char *)eth_hdr + pkt->l2_len); tcp_hdr = (struct rte_tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len); - hdr_len = pkt->l2_len + pkt->l3_len + pkt->l4_len; /* * Don't process the packet which has FIN, SYN, RST, PSH, URG, ECE @@ -229,8 +229,9 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt, * Don't process the packet whose payload length is less than or * equal to 0. */ - tcp_dl = pkt->pkt_len - hdr_len; - if (tcp_dl <= 0) + ip_len = rte_be_to_cpu_16(ipv4_hdr->total_length); + tcp_dl = ip_len - (pkt->l3_len + pkt->l4_len); + if (tcp_dl <= 0 || ip_len > pkt->pkt_len) return -1; /* -- 2.7.4
[PATCH v2 01/28] common/cnxk: add multi channel support for SDP send queues
From: Subrahmanyam Nilla Currently only base channel number is configured as default channel for all the SDP send queues. Due to this, packets sent on different SQ's are landing on the same output queue on the host. Channel number in the send queue should be configured according to the number of queues assigned to the SDP PF or VF device. Signed-off-by: Subrahmanyam Nilla --- v2: - Fixed compilation issue with some compilers in patch 24/24 - Added few more fixes net/cnxk and related code in common/cnxk drivers/common/cnxk/roc_nix_queue.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/common/cnxk/roc_nix_queue.c b/drivers/common/cnxk/roc_nix_queue.c index 07dab4b..76c049c 100644 --- a/drivers/common/cnxk/roc_nix_queue.c +++ b/drivers/common/cnxk/roc_nix_queue.c @@ -706,6 +706,7 @@ static int sq_cn9k_init(struct nix *nix, struct roc_nix_sq *sq, uint32_t rr_quantum, uint16_t smq) { + struct roc_nix *roc_nix = nix_priv_to_roc_nix(nix); struct mbox *mbox = (&nix->dev)->mbox; struct nix_aq_enq_req *aq; @@ -721,7 +722,11 @@ sq_cn9k_init(struct nix *nix, struct roc_nix_sq *sq, uint32_t rr_quantum, aq->sq.max_sqe_size = sq->max_sqe_sz; aq->sq.smq = smq; aq->sq.smq_rr_quantum = rr_quantum; - aq->sq.default_chan = nix->tx_chan_base; + if (roc_nix_is_sdp(roc_nix)) + aq->sq.default_chan = + nix->tx_chan_base + (sq->qid % nix->tx_chan_cnt); + else + aq->sq.default_chan = nix->tx_chan_base; aq->sq.sqe_stype = NIX_STYPE_STF; aq->sq.ena = 1; aq->sq.sso_ena = !!sq->sso_ena; -- 2.8.4
[PATCH v2 02/28] net/cnxk: add receive channel backpressure for SDP
From: Radha Mohan Chintakuntla The SDP interfaces also need to be configured for NIX receive channel backpressure for packet receive. Signed-off-by: Radha Mohan Chintakuntla --- drivers/common/cnxk/roc_nix_fc.c | 11 +-- drivers/net/cnxk/cnxk_ethdev.c | 3 +++ 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/common/cnxk/roc_nix_fc.c b/drivers/common/cnxk/roc_nix_fc.c index 8e31443..a0505bd 100644 --- a/drivers/common/cnxk/roc_nix_fc.c +++ b/drivers/common/cnxk/roc_nix_fc.c @@ -38,16 +38,13 @@ nix_fc_rxchan_bpid_set(struct roc_nix *roc_nix, bool enable) struct nix_bp_cfg_rsp *rsp; int rc = -ENOSPC, i; - if (roc_nix_is_sdp(roc_nix)) - return 0; - if (enable) { req = mbox_alloc_msg_nix_bp_enable(mbox); if (req == NULL) return rc; req->chan_base = 0; - if (roc_nix_is_lbk(roc_nix)) + if (roc_nix_is_lbk(roc_nix) || roc_nix_is_sdp(roc_nix)) req->chan_cnt = NIX_LBK_MAX_CHAN; else req->chan_cnt = NIX_CGX_MAX_CHAN; @@ -203,7 +200,8 @@ nix_fc_cq_config_set(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg) int roc_nix_fc_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg) { - if (roc_nix_is_vf_or_sdp(roc_nix) && !roc_nix_is_lbk(roc_nix)) + if (!roc_nix_is_pf(roc_nix) && !roc_nix_is_lbk(roc_nix) && + !roc_nix_is_sdp(roc_nix)) return 0; if (fc_cfg->type == ROC_NIX_FC_CQ_CFG) @@ -219,7 +217,8 @@ roc_nix_fc_config_get(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg) int roc_nix_fc_config_set(struct roc_nix *roc_nix, struct roc_nix_fc_cfg *fc_cfg) { - if (roc_nix_is_vf_or_sdp(roc_nix) && !roc_nix_is_lbk(roc_nix)) + if (!roc_nix_is_pf(roc_nix) && !roc_nix_is_lbk(roc_nix) && + !roc_nix_is_sdp(roc_nix)) return 0; if (fc_cfg->type == ROC_NIX_FC_CQ_CFG) diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c index 1fa4131..bd31a9a 100644 --- a/drivers/net/cnxk/cnxk_ethdev.c +++ b/drivers/net/cnxk/cnxk_ethdev.c @@ -310,6 +310,9 @@ nix_init_flow_ctrl_config(struct rte_eth_dev *eth_dev) struct cnxk_fc_cfg *fc = &dev->fc_cfg; int rc; + if (roc_nix_is_sdp(&dev->nix)) + return 0; + /* To avoid Link credit deadlock on Ax, disable Tx FC if it's enabled */ if (roc_model_is_cn96_ax() && dev->npc.switch_header_type != ROC_PRIV_FLAGS_HIGIG) -- 2.8.4
[PATCH v2 03/28] common/cnxk: add new pkind for CPT when ts is enabled
From: Vidya Sagar Velumuri With Timestamp enabled, time stamp will be added to second pass packets from CPT. NPC needs different configuration to parse second pass packets with and without timestamp. New pkind is defined for CPT when time stamp is enabled on NIX. CPT should use this PKIND for second pass packets when TS is enabled for corresponding pktio. Signed-off-by: Vidya Sagar Velumuri --- drivers/common/cnxk/roc_ie_ot.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/common/cnxk/roc_ie_ot.h b/drivers/common/cnxk/roc_ie_ot.h index 173cc2c..56a1e9f 100644 --- a/drivers/common/cnxk/roc_ie_ot.h +++ b/drivers/common/cnxk/roc_ie_ot.h @@ -15,6 +15,7 @@ #define ROC_IE_OT_CTX_ILEN 2 /* PKIND to be used for CPT Meta parsing */ #define ROC_IE_OT_CPT_PKIND 58 +#define ROC_IE_OT_CPT_TS_PKIND 54 #define ROC_IE_OT_SA_CTX_HDR_SIZE 1 enum roc_ie_ot_ucc_ipsec { -- 2.8.4
[PATCH v2 05/28] common/cnxk: fix SQ flush sequence
From: Satha Rao Fix SQ flush sequence to issue NIX RX SW Sync after SMQ flush. This sync ensures that all the packets that were inflight are flushed out of memory. This patch also fixes NULL return issues reported by static analysis tool in Traffic Manager and sync's mbox to that of Kernel version. Fixes: 05d727e8b14a ("common/cnxk: support NIX traffic management") Fixes: 0b7e667ee303 ("common/cnxk: enable packet marking") Signed-off-by: Satha Rao --- drivers/common/cnxk/roc_mbox.h| 35 +-- drivers/common/cnxk/roc_nix_tm.c | 7 +++ drivers/common/cnxk/roc_nix_tm_mark.c | 9 + 3 files changed, 49 insertions(+), 2 deletions(-) diff --git a/drivers/common/cnxk/roc_mbox.h b/drivers/common/cnxk/roc_mbox.h index b608f58..2c30f19 100644 --- a/drivers/common/cnxk/roc_mbox.h +++ b/drivers/common/cnxk/roc_mbox.h @@ -116,7 +116,7 @@ struct mbox_msghdr { msg_rsp) \ M(SSO_GRP_GET_PRIORITY, 0x606, sso_grp_get_priority, sso_info_req, \ sso_grp_priority)\ - M(SSO_WS_CACHE_INV, 0x607, sso_ws_cache_inv, msg_req, msg_rsp) \ + M(SSO_WS_CACHE_INV, 0x607, sso_ws_cache_inv, ssow_lf_inv_req, msg_rsp) \ M(SSO_GRP_QOS_CONFIG, 0x608, sso_grp_qos_config, sso_grp_qos_cfg, \ msg_rsp) \ M(SSO_GRP_GET_STATS, 0x609, sso_grp_get_stats, sso_info_req, \ @@ -125,6 +125,9 @@ struct mbox_msghdr { sso_hws_stats) \ M(SSO_HW_RELEASE_XAQ, 0x611, sso_hw_release_xaq_aura, \ sso_hw_xaq_release, msg_rsp) \ + M(SSO_CONFIG_LSW, 0x612, ssow_config_lsw, ssow_config_lsw, msg_rsp)\ + M(SSO_HWS_CHNG_MSHIP, 0x613, ssow_chng_mship, ssow_chng_mship, \ + msg_rsp) \ /* TIM mbox IDs (range 0x800 - 0x9FF) */ \ M(TIM_LF_ALLOC, 0x800, tim_lf_alloc, tim_lf_alloc_req, \ tim_lf_alloc_rsp)\ @@ -259,7 +262,8 @@ struct mbox_msghdr { M(NIX_CPT_BP_ENABLE, 0x8020, nix_cpt_bp_enable, nix_bp_cfg_req,\ nix_bp_cfg_rsp) \ M(NIX_CPT_BP_DISABLE, 0x8021, nix_cpt_bp_disable, nix_bp_cfg_req, \ - msg_rsp) + msg_rsp) \ + M(NIX_RX_SW_SYNC, 0x8022, nix_rx_sw_sync, msg_req, msg_rsp) /* Messages initiated by AF (range 0xC00 - 0xDFF) */ #define MBOX_UP_CGX_MESSAGES \ @@ -1268,6 +1272,33 @@ struct ssow_lf_free_req { uint16_t __io hws; }; +#define SSOW_INVAL_SELECTIVE_VER 0x1000 +struct ssow_lf_inv_req { + struct mbox_msghdr hdr; + uint16_t nb_hws; /* Number of HWS to invalidate*/ + uint16_t hws[MAX_RVU_BLKLF_CNT]; /* Array of HWS */ +}; + +struct ssow_config_lsw { + struct mbox_msghdr hdr; +#define SSOW_LSW_DIS0 +#define SSOW_LSW_GW_WAIT 1 +#define SSOW_LSW_GW_IMM 2 + uint8_t __io lsw_mode; +#define SSOW_WQE_REL_LSW_WAIT 0 +#define SSOW_WQE_REL_IMM 1 + uint8_t __io wqe_release; +}; + +struct ssow_chng_mship { + struct mbox_msghdr hdr; + uint8_t __io set;/* Membership set to modify. */ + uint8_t __io enable; /* Enable/Disable the hwgrps. */ + uint8_t __io hws;/* HWS to modify. */ + uint16_t __io nb_hwgrps; /* Number of hwgrps in the array */ + uint16_t __io hwgrps[MAX_RVU_BLKLF_CNT]; /* Array of hwgrps. */ +}; + struct sso_hw_setconfig { struct mbox_msghdr hdr; uint32_t __io npa_aura_id; diff --git a/drivers/common/cnxk/roc_nix_tm.c b/drivers/common/cnxk/roc_nix_tm.c index 5b70c7b..42d3abd 100644 --- a/drivers/common/cnxk/roc_nix_tm.c +++ b/drivers/common/cnxk/roc_nix_tm.c @@ -590,6 +590,7 @@ nix_tm_sq_flush_pre(struct roc_nix_sq *sq) struct nix_tm_node *node, *sibling; struct nix_tm_node_list *list; enum roc_nix_tm_tree tree; + struct msg_req *req; struct mbox *mbox; struct nix *nix; uint16_t qid; @@ -679,6 +680,12 @@ nix_tm_sq_flush_pre(struct roc_nix_sq *sq) rc); goto cleanup; } + + req = mbox_alloc_msg_nix_rx_sw_sync(mbox); + if (!req) + return -ENOSPC; + + rc = mbox_process(mbox); cleanup: /* Restore cgx state */ if (!roc_nix->io_enabled) { diff --git a/drivers/common/cnxk/roc_nix_tm_mark.c b/drivers/common/cnxk/roc_nix_tm_mark.c index 64cf679..d37292e 100644 --- a/drivers/common/cnxk/roc_nix_
[PATCH v2 04/28] common/cnxk: support to configure the ts pkind in CPT
From: Vidya Sagar Velumuri Add new API to configure the SA table entries with new CPT PKIND when timestamp is enabled. Signed-off-by: Vidya Sagar Velumuri --- drivers/common/cnxk/roc_nix_inl.c | 59 ++ drivers/common/cnxk/roc_nix_inl.h | 2 ++ drivers/common/cnxk/roc_nix_inl_priv.h | 1 + drivers/common/cnxk/version.map| 1 + 4 files changed, 63 insertions(+) diff --git a/drivers/common/cnxk/roc_nix_inl.c b/drivers/common/cnxk/roc_nix_inl.c index 826c6e9..bfb33b1 100644 --- a/drivers/common/cnxk/roc_nix_inl.c +++ b/drivers/common/cnxk/roc_nix_inl.c @@ -1011,6 +1011,65 @@ roc_nix_inl_ctx_write(struct roc_nix *roc_nix, void *sa_dptr, void *sa_cptr, return -ENOTSUP; } +int +roc_nix_inl_ts_pkind_set(struct roc_nix *roc_nix, bool ts_ena, bool inb_inl_dev) +{ + struct idev_cfg *idev = idev_get_cfg(); + struct nix_inl_dev *inl_dev = NULL; + void *sa, *sa_base = NULL; + struct nix *nix = NULL; + uint16_t max_spi = 0; + uint8_t pkind = 0; + int i; + + if (roc_model_is_cn9k()) + return 0; + + if (!inb_inl_dev && (roc_nix == NULL)) + return -EINVAL; + + if (inb_inl_dev) { + if ((idev == NULL) || (idev->nix_inl_dev == NULL)) + return 0; + inl_dev = idev->nix_inl_dev; + } else { + nix = roc_nix_to_nix_priv(roc_nix); + if (!nix->inl_inb_ena) + return 0; + sa_base = nix->inb_sa_base; + max_spi = roc_nix->ipsec_in_max_spi; + } + + if (inl_dev) { + if (inl_dev->rq_refs == 0) { + inl_dev->ts_ena = ts_ena; + max_spi = inl_dev->ipsec_in_max_spi; + sa_base = inl_dev->inb_sa_base; + } else if (inl_dev->ts_ena != ts_ena) { + if (inl_dev->ts_ena) + plt_err("Inline device is already configured with TS enable"); + else + plt_err("Inline device is already configured with TS disable"); + return -ENOTSUP; + } else { + return 0; + } + } + + pkind = ts_ena ? ROC_IE_OT_CPT_TS_PKIND : ROC_IE_OT_CPT_PKIND; + + sa = (uint8_t *)sa_base; + if (pkind == ((struct roc_ot_ipsec_inb_sa *)sa)->w0.s.pkind) + return 0; + + for (i = 0; i < max_spi; i++) { + sa = ((uint8_t *)sa_base) + +(i * ROC_NIX_INL_OT_IPSEC_INB_SA_SZ); + ((struct roc_ot_ipsec_inb_sa *)sa)->w0.s.pkind = pkind; + } + return 0; +} + void roc_nix_inl_dev_lock(void) { diff --git a/drivers/common/cnxk/roc_nix_inl.h b/drivers/common/cnxk/roc_nix_inl.h index 2c2a4d7..633f090 100644 --- a/drivers/common/cnxk/roc_nix_inl.h +++ b/drivers/common/cnxk/roc_nix_inl.h @@ -174,6 +174,8 @@ int __roc_api roc_nix_inl_inb_tag_update(struct roc_nix *roc_nix, uint64_t __roc_api roc_nix_inl_dev_rq_limit_get(void); int __roc_api roc_nix_reassembly_configure(uint32_t max_wait_time, uint16_t max_frags); +int __roc_api roc_nix_inl_ts_pkind_set(struct roc_nix *roc_nix, bool ts_ena, + bool inb_inl_dev); /* NIX Inline Outbound API */ int __roc_api roc_nix_inl_outb_init(struct roc_nix *roc_nix); diff --git a/drivers/common/cnxk/roc_nix_inl_priv.h b/drivers/common/cnxk/roc_nix_inl_priv.h index 0fa5e09..f9646a3 100644 --- a/drivers/common/cnxk/roc_nix_inl_priv.h +++ b/drivers/common/cnxk/roc_nix_inl_priv.h @@ -76,6 +76,7 @@ struct nix_inl_dev { uint32_t inb_spi_mask; bool attach_cptlf; bool wqe_skip; + bool ts_ena; }; int nix_inl_sso_register_irqs(struct nix_inl_dev *inl_dev); diff --git a/drivers/common/cnxk/version.map b/drivers/common/cnxk/version.map index 2a122e5..53586da 100644 --- a/drivers/common/cnxk/version.map +++ b/drivers/common/cnxk/version.map @@ -159,6 +159,7 @@ INTERNAL { roc_nix_inl_outb_is_enabled; roc_nix_inl_outb_soft_exp_poll_switch; roc_nix_inl_sa_sync; + roc_nix_inl_ts_pkind_set; roc_nix_inl_ctx_write; roc_nix_inl_dev_pffunc_get; roc_nix_cpt_ctx_cache_sync; -- 2.8.4
[PATCH v2 06/28] common/cnxk: skip probing SoC environment for CN9k
From: Rakesh Kudurumalla SoC run platform file is not present in CN9k so probing is done for CN10k devices Signed-off-by: Rakesh Kudurumalla --- drivers/common/cnxk/roc_model.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/common/cnxk/roc_model.c b/drivers/common/cnxk/roc_model.c index 1dd374e..a68baa6 100644 --- a/drivers/common/cnxk/roc_model.c +++ b/drivers/common/cnxk/roc_model.c @@ -2,6 +2,9 @@ * Copyright(C) 2021 Marvell. */ +#include +#include + #include "roc_api.h" #include "roc_priv.h" @@ -211,6 +214,12 @@ of_env_get(struct roc_model *model) uint64_t flag; FILE *fp; + if (access(path, F_OK) != 0) { + strncpy(model->env, "HW_PLATFORM", ROC_MODEL_STR_LEN_MAX - 1); + model->flag |= ROC_ENV_HW; + return; + } + fp = fopen(path, "r"); if (!fp) { plt_err("Failed to open %s", path); -- 2.8.4
[PATCH v2 07/28] common/cnxk: fix issues in soft expiry disable path
Fix issues in mode where soft expiry is disabled in RoC. When soft expiry support is not enabled in inline device, memory is not allocated for the ring base array and should not be accessed. Fixes: bea5d990a93b ("net/cnxk: support outbound soft expiry notification") Signed-off-by: Nithin Dabilpuram --- drivers/common/cnxk/roc_nix_inl.c | 9 + drivers/common/cnxk/roc_nix_inl_dev.c | 5 +++-- drivers/common/cnxk/roc_nix_inl_priv.h | 1 + 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/common/cnxk/roc_nix_inl.c b/drivers/common/cnxk/roc_nix_inl.c index bfb33b1..6c72248 100644 --- a/drivers/common/cnxk/roc_nix_inl.c +++ b/drivers/common/cnxk/roc_nix_inl.c @@ -208,7 +208,7 @@ roc_nix_inl_inb_sa_sz(struct roc_nix *roc_nix, bool inl_dev_sa) uintptr_t roc_nix_inl_inb_sa_get(struct roc_nix *roc_nix, bool inb_inl_dev, uint32_t spi) { - uint32_t max_spi, min_spi, mask; + uint32_t max_spi = 0, min_spi = 0, mask; uintptr_t sa_base; uint64_t sz; @@ -461,7 +461,7 @@ roc_nix_inl_outb_init(struct roc_nix *roc_nix) nix->outb_se_ring_base = roc_nix->port_id * ROC_NIX_SOFT_EXP_PER_PORT_MAX_RINGS; - if (inl_dev == NULL) { + if (inl_dev == NULL || !inl_dev->set_soft_exp_poll) { nix->outb_se_ring_cnt = 0; return 0; } @@ -537,11 +537,12 @@ roc_nix_inl_outb_fini(struct roc_nix *roc_nix) plt_free(nix->outb_sa_base); nix->outb_sa_base = NULL; - if (idev && idev->nix_inl_dev) { + if (idev && idev->nix_inl_dev && nix->outb_se_ring_cnt) { inl_dev = idev->nix_inl_dev; ring_base = inl_dev->sa_soft_exp_ring; + ring_base += nix->outb_se_ring_base; - for (i = 0; i < ROC_NIX_INL_MAX_SOFT_EXP_RNGS; i++) { + for (i = 0; i < nix->outb_se_ring_cnt; i++) { if (ring_base[i]) plt_free(PLT_PTR_CAST(ring_base[i])); } diff --git a/drivers/common/cnxk/roc_nix_inl_dev.c b/drivers/common/cnxk/roc_nix_inl_dev.c index 51f1f68..5e61a42 100644 --- a/drivers/common/cnxk/roc_nix_inl_dev.c +++ b/drivers/common/cnxk/roc_nix_inl_dev.c @@ -814,6 +814,7 @@ roc_nix_inl_dev_init(struct roc_nix_inl_dev *roc_inl_dev) inl_dev->wqe_skip = roc_inl_dev->wqe_skip; inl_dev->spb_drop_pc = NIX_AURA_DROP_PC_DFLT; inl_dev->lpb_drop_pc = NIX_AURA_DROP_PC_DFLT; + inl_dev->set_soft_exp_poll = roc_inl_dev->set_soft_exp_poll; if (roc_inl_dev->spb_drop_pc) inl_dev->spb_drop_pc = roc_inl_dev->spb_drop_pc; @@ -849,7 +850,7 @@ roc_nix_inl_dev_init(struct roc_nix_inl_dev *roc_inl_dev) if (rc) goto sso_release; - if (roc_inl_dev->set_soft_exp_poll) { + if (inl_dev->set_soft_exp_poll) { rc = nix_inl_outb_poll_thread_setup(inl_dev); if (rc) goto cpt_release; @@ -898,7 +899,7 @@ roc_nix_inl_dev_fini(struct roc_nix_inl_dev *roc_inl_dev) inl_dev = idev->nix_inl_dev; pci_dev = inl_dev->pci_dev; - if (roc_inl_dev->set_soft_exp_poll) { + if (inl_dev->set_soft_exp_poll) { soft_exp_poll_thread_exit = true; pthread_join(inl_dev->soft_exp_poll_thread, NULL); plt_bitmap_free(inl_dev->soft_exp_ring_bmap); diff --git a/drivers/common/cnxk/roc_nix_inl_priv.h b/drivers/common/cnxk/roc_nix_inl_priv.h index f9646a3..1ab8470 100644 --- a/drivers/common/cnxk/roc_nix_inl_priv.h +++ b/drivers/common/cnxk/roc_nix_inl_priv.h @@ -59,6 +59,7 @@ struct nix_inl_dev { pthread_t soft_exp_poll_thread; uint32_t soft_exp_poll_freq; uint64_t *sa_soft_exp_ring; + bool set_soft_exp_poll; /* Soft expiry ring bitmap */ struct plt_bitmap *soft_exp_ring_bmap; -- 2.8.4
[PATCH v2 08/28] common/cnxk: convert warning to debug print
From: Akhil Goyal Inbound SA SPI if not in min-max range specified in devargs, was marked as a warning. But this is not converted to debug print because if the entry is found to be duplicate in the mask, it will give another error print. Hence, warning print is not needed and is now converted to debug print. Signed-off-by: Akhil Goyal --- drivers/common/cnxk/roc_nix_inl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/common/cnxk/roc_nix_inl.c b/drivers/common/cnxk/roc_nix_inl.c index 6c72248..2c013cb 100644 --- a/drivers/common/cnxk/roc_nix_inl.c +++ b/drivers/common/cnxk/roc_nix_inl.c @@ -221,7 +221,7 @@ roc_nix_inl_inb_sa_get(struct roc_nix *roc_nix, bool inb_inl_dev, uint32_t spi) mask = roc_nix_inl_inb_spi_range(roc_nix, inb_inl_dev, &min_spi, &max_spi); if (spi > max_spi || spi < min_spi) - plt_warn("Inbound SA SPI %u not in range (%u..%u)", spi, + plt_nix_dbg("Inbound SA SPI %u not in range (%u..%u)", spi, min_spi, max_spi); /* Get SA size */ -- 2.8.4
[PATCH v2 09/28] common/cnxk: use aggregate level rr prio from mbox
Use aggregate level Round Robin Priority from mbox response instead of fixing it to single macro. This is useful when kernel AF driver changes the constant. Signed-off-by: Nithin Dabilpuram --- drivers/common/cnxk/roc_nix_priv.h | 5 +++-- drivers/common/cnxk/roc_nix_tm.c | 3 ++- drivers/common/cnxk/roc_nix_tm_utils.c | 8 3 files changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/common/cnxk/roc_nix_priv.h b/drivers/common/cnxk/roc_nix_priv.h index 9b9ffae..cc69d71 100644 --- a/drivers/common/cnxk/roc_nix_priv.h +++ b/drivers/common/cnxk/roc_nix_priv.h @@ -181,6 +181,7 @@ struct nix { uint16_t tm_root_lvl; uint16_t tm_flags; uint16_t tm_link_cfg_lvl; + uint8_t tm_aggr_lvl_rr_prio; uint16_t contig_rsvd[NIX_TXSCH_LVL_CNT]; uint16_t discontig_rsvd[NIX_TXSCH_LVL_CNT]; uint64_t tm_markfmt_en; @@ -284,7 +285,6 @@ void nix_unregister_irqs(struct nix *nix); /* Default TL1 priority and Quantum from AF */ #define NIX_TM_TL1_DFLT_RR_QTM ((1 << 24) - 1) -#define NIX_TM_TL1_DFLT_RR_PRIO 1 struct nix_tm_shaper_data { uint64_t burst_exponent; @@ -432,7 +432,8 @@ bool nix_tm_child_res_valid(struct nix_tm_node_list *list, struct nix_tm_node *parent); uint16_t nix_tm_resource_estimate(struct nix *nix, uint16_t *schq_contig, uint16_t *schq, enum roc_nix_tm_tree tree); -uint8_t nix_tm_tl1_default_prep(uint32_t schq, volatile uint64_t *reg, +uint8_t nix_tm_tl1_default_prep(struct nix *nix, uint32_t schq, + volatile uint64_t *reg, volatile uint64_t *regval); uint8_t nix_tm_topology_reg_prep(struct nix *nix, struct nix_tm_node *node, volatile uint64_t *reg, diff --git a/drivers/common/cnxk/roc_nix_tm.c b/drivers/common/cnxk/roc_nix_tm.c index 42d3abd..7fd54ef 100644 --- a/drivers/common/cnxk/roc_nix_tm.c +++ b/drivers/common/cnxk/roc_nix_tm.c @@ -55,7 +55,7 @@ nix_tm_node_reg_conf(struct nix *nix, struct nix_tm_node *node) req = mbox_alloc_msg_nix_txschq_cfg(mbox); req->lvl = NIX_TXSCH_LVL_TL1; - k = nix_tm_tl1_default_prep(node->parent_hw_id, req->reg, + k = nix_tm_tl1_default_prep(nix, node->parent_hw_id, req->reg, req->regval); req->num_regs = k; rc = mbox_process(mbox); @@ -1288,6 +1288,7 @@ nix_tm_alloc_txschq(struct nix *nix, enum roc_nix_tm_tree tree) } while (pend); nix->tm_link_cfg_lvl = rsp->link_cfg_lvl; + nix->tm_aggr_lvl_rr_prio = rsp->aggr_lvl_rr_prio; return 0; alloc_err: for (i = 0; i < NIX_TXSCH_LVL_CNT; i++) { diff --git a/drivers/common/cnxk/roc_nix_tm_utils.c b/drivers/common/cnxk/roc_nix_tm_utils.c index bcdf990..b9b605f 100644 --- a/drivers/common/cnxk/roc_nix_tm_utils.c +++ b/drivers/common/cnxk/roc_nix_tm_utils.c @@ -478,7 +478,7 @@ nix_tm_child_res_valid(struct nix_tm_node_list *list, } uint8_t -nix_tm_tl1_default_prep(uint32_t schq, volatile uint64_t *reg, +nix_tm_tl1_default_prep(struct nix *nix, uint32_t schq, volatile uint64_t *reg, volatile uint64_t *regval) { uint8_t k = 0; @@ -496,7 +496,7 @@ nix_tm_tl1_default_prep(uint32_t schq, volatile uint64_t *reg, k++; reg[k] = NIX_AF_TL1X_TOPOLOGY(schq); - regval[k] = (NIX_TM_TL1_DFLT_RR_PRIO << 1); + regval[k] = (nix->tm_aggr_lvl_rr_prio << 1); k++; reg[k] = NIX_AF_TL1X_CIR(schq); @@ -540,7 +540,7 @@ nix_tm_topology_reg_prep(struct nix *nix, struct nix_tm_node *node, * Static Priority is disabled */ if (hw_lvl == NIX_TXSCH_LVL_TL1 && nix->tm_flags & NIX_TM_TL1_NO_SP) { - rr_prio = NIX_TM_TL1_DFLT_RR_PRIO; + rr_prio = nix->tm_aggr_lvl_rr_prio; child = 0; } @@ -662,7 +662,7 @@ nix_tm_sched_reg_prep(struct nix *nix, struct nix_tm_node *node, */ if (hw_lvl == NIX_TXSCH_LVL_TL2 && (!nix_tm_have_tl1_access(nix) || nix->tm_flags & NIX_TM_TL1_NO_SP)) - strict_prio = NIX_TM_TL1_DFLT_RR_PRIO; + strict_prio = nix->tm_aggr_lvl_rr_prio; plt_tm_dbg("Schedule config node %s(%u) lvl %u id %u, " "prio 0x%" PRIx64 ", rr_quantum/rr_wt 0x%" PRIx64 " (%p)", -- 2.8.4
[PATCH v2 10/28] net/cnxk: support loopback mode on AF VF's
Support internal loopback mode on AF VF's using RoC by setting Tx channel same as Rx channel. Signed-off-by: Nithin Dabilpuram --- drivers/net/cnxk/cnxk_ethdev.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c index bd31a9a..e1b1e16 100644 --- a/drivers/net/cnxk/cnxk_ethdev.c +++ b/drivers/net/cnxk/cnxk_ethdev.c @@ -1119,6 +1119,9 @@ cnxk_nix_configure(struct rte_eth_dev *eth_dev) nb_rxq = RTE_MAX(data->nb_rx_queues, 1); nb_txq = RTE_MAX(data->nb_tx_queues, 1); + if (roc_nix_is_lbk(nix)) + nix->enable_loop = eth_dev->data->dev_conf.lpbk_mode; + /* Alloc a nix lf */ rc = roc_nix_lf_alloc(nix, nb_rxq, nb_txq, rx_cfg); if (rc) { @@ -1242,6 +1245,9 @@ cnxk_nix_configure(struct rte_eth_dev *eth_dev) } } + if (roc_nix_is_lbk(nix)) + goto skip_lbk_setup; + /* Configure loop back mode */ rc = roc_nix_mac_loopback_enable(nix, eth_dev->data->dev_conf.lpbk_mode); @@ -1250,6 +1256,7 @@ cnxk_nix_configure(struct rte_eth_dev *eth_dev) goto cq_fini; } +skip_lbk_setup: /* Setup Inline security support */ rc = nix_security_setup(dev); if (rc) -- 2.8.4
[PATCH v2 11/28] net/cnxk: update LBK ethdev link info
Update link info of LBK ethdev i.e AF's VF's as always up and 100G. This is because there is no phy for the LBK interfaces and we won't get a link update notification for the same. Signed-off-by: Nithin Dabilpuram --- drivers/net/cnxk/cnxk_link.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/cnxk/cnxk_link.c b/drivers/net/cnxk/cnxk_link.c index f10a502..b1d59e3 100644 --- a/drivers/net/cnxk/cnxk_link.c +++ b/drivers/net/cnxk/cnxk_link.c @@ -12,6 +12,17 @@ cnxk_nix_toggle_flag_link_cfg(struct cnxk_eth_dev *dev, bool set) else dev->flags &= ~CNXK_LINK_CFG_IN_PROGRESS_F; + /* Update link info for LBK */ + if (!set && roc_nix_is_lbk(&dev->nix)) { + struct rte_eth_link link; + + link.link_status = RTE_ETH_LINK_UP; + link.link_speed = RTE_ETH_SPEED_NUM_100G; + link.link_autoneg = RTE_ETH_LINK_FIXED; + link.link_duplex = RTE_ETH_LINK_FULL_DUPLEX; + rte_eth_linkstatus_set(dev->eth_dev, &link); + } + rte_wmb(); } -- 2.8.4
[PATCH v2 12/28] net/cnxk: add barrier after meta batch free in scalar
Add barrier after meta batch free in scalar routine when lmt lines are exactly full to make sure that next LMT line user in Tx only starts writing the lines only when previous stoerl's are complete. Fixes: 4382a7ccf781 ("net/cnxk: support Rx security offload on cn10k") Cc: sta...@dpdk.org Signed-off-by: Nithin Dabilpuram --- drivers/net/cnxk/cn10k_rx.h | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h index e4f5a55..94c1f1e 100644 --- a/drivers/net/cnxk/cn10k_rx.h +++ b/drivers/net/cnxk/cn10k_rx.h @@ -1007,10 +1007,11 @@ cn10k_nix_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts, plt_write64((wdata | nb_pkts), rxq->cq_door); /* Free remaining meta buffers if any */ - if (flags & NIX_RX_OFFLOAD_SECURITY_F && loff) { + if (flags & NIX_RX_OFFLOAD_SECURITY_F && loff) nix_sec_flush_meta(laddr, lmt_id + lnum, loff, aura_handle); - plt_io_wmb(); - } + + if (flags & NIX_RX_OFFLOAD_SECURITY_F) + rte_io_wmb(); return nb_pkts; } -- 2.8.4
[PATCH v2 13/28] net/cnxk: disable default inner chksum for outb inline
Disable default inner L3/L4 checksum generation for outbound inline path and enable based on SA options or RTE_MBUF flags as per the spec. Though the checksum generation is not impacting much performance, it is overwriting zero checksum for UDP packets which is not always good. Signed-off-by: Nithin Dabilpuram --- drivers/net/cnxk/cn10k_ethdev.h | 4 +++- drivers/net/cnxk/cn10k_ethdev_sec.c | 3 +++ drivers/net/cnxk/cn10k_tx.h | 44 ++--- 3 files changed, 42 insertions(+), 9 deletions(-) diff --git a/drivers/net/cnxk/cn10k_ethdev.h b/drivers/net/cnxk/cn10k_ethdev.h index 1e49d65..9642d6a 100644 --- a/drivers/net/cnxk/cn10k_ethdev.h +++ b/drivers/net/cnxk/cn10k_ethdev.h @@ -71,7 +71,9 @@ struct cn10k_sec_sess_priv { uint8_t mode : 1; uint8_t roundup_byte : 5; uint8_t roundup_len; - uint16_t partial_len; + uint16_t partial_len : 10; + uint16_t chksum : 2; + uint16_t rsvd : 4; }; uint64_t u64; diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c index 87bb691..b307215 100644 --- a/drivers/net/cnxk/cn10k_ethdev_sec.c +++ b/drivers/net/cnxk/cn10k_ethdev_sec.c @@ -552,6 +552,9 @@ cn10k_eth_sec_session_create(void *device, sess_priv.partial_len = rlens->partial_len; sess_priv.mode = outb_sa_dptr->w2.s.ipsec_mode; sess_priv.outer_ip_ver = outb_sa_dptr->w2.s.outer_ip_ver; + /* Propagate inner checksum enable from SA to fast path */ + sess_priv.chksum = (!ipsec->options.ip_csum_enable << 1 | + !ipsec->options.l4_csum_enable); /* Pointer from eth_sec -> outb_sa */ eth_sec->sa = outb_sa; diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index de88a21..981bc9b 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -246,6 +246,7 @@ cn10k_nix_prep_sec_vec(struct rte_mbuf *m, uint64x2_t *cmd0, uint64x2_t *cmd1, { struct cn10k_sec_sess_priv sess_priv; uint32_t pkt_len, dlen_adj, rlen; + uint8_t l3l4type, chksum; uint64x2_t cmd01, cmd23; uintptr_t dptr, nixtx; uint64_t ucode_cmd[4]; @@ -256,10 +257,23 @@ cn10k_nix_prep_sec_vec(struct rte_mbuf *m, uint64x2_t *cmd0, uint64x2_t *cmd1, sess_priv.u64 = *rte_security_dynfield(m); - if (flags & NIX_TX_NEED_SEND_HDR_W1) + if (flags & NIX_TX_NEED_SEND_HDR_W1) { l2_len = vgetq_lane_u8(*cmd0, 8); - else + /* Extract l3l4type either from il3il4type or ol3ol4type */ + if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F && + flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) + l3l4type = vgetq_lane_u8(*cmd0, 13); + else + l3l4type = vgetq_lane_u8(*cmd0, 12); + + chksum = (l3l4type & 0x1) << 1 | !!(l3l4type & 0x30); + chksum = ~chksum; + sess_priv.chksum = sess_priv.chksum & chksum; + /* Clear SEND header flags */ + *cmd0 = vsetq_lane_u16(0, *cmd0, 6); + } else { l2_len = m->l2_len; + } /* Retrieve DPTR */ dptr = vgetq_lane_u64(*cmd1, 1); @@ -291,8 +305,8 @@ cn10k_nix_prep_sec_vec(struct rte_mbuf *m, uint64x2_t *cmd0, uint64x2_t *cmd1, sa_base &= ~0xUL; sa = (uintptr_t)roc_nix_inl_ot_ipsec_outb_sa(sa_base, sess_priv.sa_idx); ucode_cmd[3] = (ROC_CPT_DFLT_ENG_GRP_SE_IE << 61 | 1UL << 60 | sa); - ucode_cmd[0] = - (ROC_IE_OT_MAJOR_OP_PROCESS_OUTBOUND_IPSEC << 48 | pkt_len); + ucode_cmd[0] = (ROC_IE_OT_MAJOR_OP_PROCESS_OUTBOUND_IPSEC << 48 | + ((uint64_t)sess_priv.chksum) << 32 | pkt_len); /* CPT Word 0 and Word 1 */ cmd01 = vdupq_n_u64((nixtx + 16) | (cn10k_nix_tx_ext_subs(flags) + 1)); @@ -343,6 +357,7 @@ cn10k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, struct cn10k_sec_sess_priv sess_priv; uint32_t pkt_len, dlen_adj, rlen; struct nix_send_hdr_s *send_hdr; + uint8_t l3l4type, chksum; uint64x2_t cmd01, cmd23; union nix_send_sg_s *sg; uintptr_t dptr, nixtx; @@ -360,10 +375,23 @@ cn10k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, else sg = (union nix_send_sg_s *)&cmd[2]; - if (flags & NIX_TX_NEED_SEND_HDR_W1) + if (flags & NIX_TX_NEED_SEND_HDR_W1) { l2_len = cmd[1] & 0xFF; - else + /* Extract l3l4type either from il3il4type or ol3ol4type */ + if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F && + flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) +
[PATCH v2 14/28] net/cnxk: fix roundup size with transport mode
For transport mode, roundup needs to be based on L4 data and shouldn't include L3 length. By including l3 length, rlen that is calculated and put in send hdr would cross the final length of the packet in some scenarios where padding is necessary. Also when outer and inner checksum offload flags are enabled, get the l2_len and l3_len from il3ptr and il4ptr. Fixes: 55bfac717c72 ("net/cnxk: support Tx security offload on cn10k") Cc: sta...@dpdk.org Signed-off-by: Nithin Dabilpuram --- drivers/net/cnxk/cn10k_tx.h | 34 ++ 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index 981bc9b..c25825c 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -248,23 +248,29 @@ cn10k_nix_prep_sec_vec(struct rte_mbuf *m, uint64x2_t *cmd0, uint64x2_t *cmd1, uint32_t pkt_len, dlen_adj, rlen; uint8_t l3l4type, chksum; uint64x2_t cmd01, cmd23; + uint8_t l2_len, l3_len; uintptr_t dptr, nixtx; uint64_t ucode_cmd[4]; uint64_t *laddr; - uint8_t l2_len; uint16_t tag; uint64_t sa; sess_priv.u64 = *rte_security_dynfield(m); if (flags & NIX_TX_NEED_SEND_HDR_W1) { - l2_len = vgetq_lane_u8(*cmd0, 8); /* Extract l3l4type either from il3il4type or ol3ol4type */ if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F && - flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) + flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) { + l2_len = vgetq_lane_u8(*cmd0, 10); + /* L4 ptr from send hdr includes l2 and l3 len */ + l3_len = vgetq_lane_u8(*cmd0, 11) - l2_len; l3l4type = vgetq_lane_u8(*cmd0, 13); - else + } else { + l2_len = vgetq_lane_u8(*cmd0, 8); + /* L4 ptr from send hdr includes l2 and l3 len */ + l3_len = vgetq_lane_u8(*cmd0, 9) - l2_len; l3l4type = vgetq_lane_u8(*cmd0, 12); + } chksum = (l3l4type & 0x1) << 1 | !!(l3l4type & 0x30); chksum = ~chksum; @@ -273,6 +279,7 @@ cn10k_nix_prep_sec_vec(struct rte_mbuf *m, uint64x2_t *cmd0, uint64x2_t *cmd1, *cmd0 = vsetq_lane_u16(0, *cmd0, 6); } else { l2_len = m->l2_len; + l3_len = m->l3_len; } /* Retrieve DPTR */ @@ -281,6 +288,8 @@ cn10k_nix_prep_sec_vec(struct rte_mbuf *m, uint64x2_t *cmd0, uint64x2_t *cmd1, /* Calculate dlen adj */ dlen_adj = pkt_len - l2_len; + /* Exclude l3 len from roundup for transport mode */ + dlen_adj -= sess_priv.mode ? 0 : l3_len; rlen = (dlen_adj + sess_priv.roundup_len) + (sess_priv.roundup_byte - 1); rlen &= ~(uint64_t)(sess_priv.roundup_byte - 1); @@ -360,10 +369,10 @@ cn10k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, uint8_t l3l4type, chksum; uint64x2_t cmd01, cmd23; union nix_send_sg_s *sg; + uint8_t l2_len, l3_len; uintptr_t dptr, nixtx; uint64_t ucode_cmd[4]; uint64_t *laddr; - uint8_t l2_len; uint16_t tag; uint64_t sa; @@ -376,13 +385,19 @@ cn10k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, sg = (union nix_send_sg_s *)&cmd[2]; if (flags & NIX_TX_NEED_SEND_HDR_W1) { - l2_len = cmd[1] & 0xFF; /* Extract l3l4type either from il3il4type or ol3ol4type */ if (flags & NIX_TX_OFFLOAD_L3_L4_CSUM_F && - flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) + flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) { + l2_len = (cmd[1] >> 16) & 0xFF; + /* L4 ptr from send hdr includes l2 and l3 len */ + l3_len = ((cmd[1] >> 24) & 0xFF) - l2_len; l3l4type = (cmd[1] >> 40) & 0xFF; - else + } else { + l2_len = cmd[1] & 0xFF; + /* L4 ptr from send hdr includes l2 and l3 len */ + l3_len = ((cmd[1] >> 8) & 0xFF) - l2_len; l3l4type = (cmd[1] >> 32) & 0xFF; + } chksum = (l3l4type & 0x1) << 1 | !!(l3l4type & 0x30); chksum = ~chksum; @@ -391,6 +406,7 @@ cn10k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, cmd[1] &= ~(0xUL << 32); } else { l2_len = m->l2_len; + l3_len = m->l3_len; } /* Retrieve DPTR */ @@ -399,6 +415,8 @@ cn10k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, /* Calculate dlen adj */ dlen_adj = pkt_len - l2_len; +
[PATCH v2 15/28] net/cnxk: update inline device in ethdev telemetry
From: Rakesh Kudurumalla inline pf func is updated in ethdev_tel_handle_info when inline device is attached to any dpdk process Signed-off-by: Rakesh Kudurumalla --- drivers/net/cnxk/cnxk_ethdev_telemetry.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/cnxk/cnxk_ethdev_telemetry.c b/drivers/net/cnxk/cnxk_ethdev_telemetry.c index 83bc658..b76dbdf 100644 --- a/drivers/net/cnxk/cnxk_ethdev_telemetry.c +++ b/drivers/net/cnxk/cnxk_ethdev_telemetry.c @@ -23,6 +23,7 @@ ethdev_tel_handle_info(const char *cmd __rte_unused, struct eth_info_s { /** PF/VF information */ uint16_t pf_func; + uint16_t inl_dev_pf_func; uint8_t max_mac_entries; bool dmac_filter_ena; uint8_t dmac_filter_count; @@ -62,6 +63,8 @@ ethdev_tel_handle_info(const char *cmd __rte_unused, info = ð_info.info; dev = cnxk_eth_pmd_priv(eth_dev); if (dev) { + info->inl_dev_pf_func = + roc_nix_inl_dev_pffunc_get(); info->pf_func = roc_nix_get_pf_func(&dev->nix); info->max_mac_entries = dev->max_mac_entries; info->dmac_filter_ena = dev->dmac_filter_enable; -- 2.8.4
[PATCH v2 16/28] net/cnxk: change env for debug IV
From: Akhil Goyal Changed environment variable name for specifying debug IV for unit testing of inline IPsec offload with known test vectors. Signed-off-by: Akhil Goyal --- drivers/net/cnxk/cn10k_ethdev_sec.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c index b307215..60b7093 100644 --- a/drivers/net/cnxk/cn10k_ethdev_sec.c +++ b/drivers/net/cnxk/cn10k_ethdev_sec.c @@ -522,10 +522,11 @@ cn10k_eth_sec_session_create(void *device, goto mempool_put; } - iv_str = getenv("CN10K_ETH_SEC_IV_OVR"); - if (iv_str) - outb_dbg_iv_update(outb_sa_dptr, iv_str); - + if (conf->ipsec.options.iv_gen_disable == 1) { + iv_str = getenv("ETH_SEC_IV_OVR"); + if (iv_str) + outb_dbg_iv_update(outb_sa_dptr, iv_str); + } /* Fill outbound sa misc params */ rc = cn10k_eth_sec_outb_sa_misc_fill(&dev->nix, outb_sa_dptr, outb_sa, ipsec, sa_idx); -- 2.8.4
[PATCH v2 17/28] net/cnxk: reset offload flag if reassembly is disabled
From: Akhil Goyal The rx offload flag need to be reset if IP reassembly flag is not set while calling reassembly_conf_set. Signed-off-by: Akhil Goyal --- drivers/net/cnxk/cn10k_ethdev.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c index b5f3c83..d04b9eb 100644 --- a/drivers/net/cnxk/cn10k_ethdev.c +++ b/drivers/net/cnxk/cn10k_ethdev.c @@ -547,6 +547,12 @@ cn10k_nix_reassembly_conf_set(struct rte_eth_dev *eth_dev, struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev); int rc = 0; + if (!conf->flags) { + /* Clear offload flags on disable */ + dev->rx_offload_flags &= ~NIX_RX_REAS_F; + return 0; + } + rc = roc_nix_reassembly_configure(conf->timeout_ms, conf->max_frags); if (!rc && dev->rx_offloads & RTE_ETH_RX_OFFLOAD_SECURITY) -- 2.8.4
[PATCH v2 18/28] net/cnxk: support decrement TTL for inline IPsec
From: Akhil Goyal Added support for decrementing TTL(IPv4)/hoplimit(IPv6) while doing inline IPsec processing if the security session sa options is enabled with dec_ttl. Signed-off-by: Akhil Goyal --- drivers/net/cnxk/cn10k_ethdev.h | 3 ++- drivers/net/cnxk/cn10k_ethdev_sec.c | 1 + drivers/net/cnxk/cn10k_tx.h | 6 -- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/net/cnxk/cn10k_ethdev.h b/drivers/net/cnxk/cn10k_ethdev.h index 9642d6a..c8666ce 100644 --- a/drivers/net/cnxk/cn10k_ethdev.h +++ b/drivers/net/cnxk/cn10k_ethdev.h @@ -73,7 +73,8 @@ struct cn10k_sec_sess_priv { uint8_t roundup_len; uint16_t partial_len : 10; uint16_t chksum : 2; - uint16_t rsvd : 4; + uint16_t dec_ttl : 1; + uint16_t rsvd : 3; }; uint64_t u64; diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c index 60b7093..f32e169 100644 --- a/drivers/net/cnxk/cn10k_ethdev_sec.c +++ b/drivers/net/cnxk/cn10k_ethdev_sec.c @@ -556,6 +556,7 @@ cn10k_eth_sec_session_create(void *device, /* Propagate inner checksum enable from SA to fast path */ sess_priv.chksum = (!ipsec->options.ip_csum_enable << 1 | !ipsec->options.l4_csum_enable); + sess_priv.dec_ttl = ipsec->options.dec_ttl; /* Pointer from eth_sec -> outb_sa */ eth_sec->sa = outb_sa; diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index c25825c..c482352 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -315,7 +315,8 @@ cn10k_nix_prep_sec_vec(struct rte_mbuf *m, uint64x2_t *cmd0, uint64x2_t *cmd1, sa = (uintptr_t)roc_nix_inl_ot_ipsec_outb_sa(sa_base, sess_priv.sa_idx); ucode_cmd[3] = (ROC_CPT_DFLT_ENG_GRP_SE_IE << 61 | 1UL << 60 | sa); ucode_cmd[0] = (ROC_IE_OT_MAJOR_OP_PROCESS_OUTBOUND_IPSEC << 48 | - ((uint64_t)sess_priv.chksum) << 32 | pkt_len); + ((uint64_t)sess_priv.chksum) << 32 | + ((uint64_t)sess_priv.dec_ttl) << 34 | pkt_len); /* CPT Word 0 and Word 1 */ cmd01 = vdupq_n_u64((nixtx + 16) | (cn10k_nix_tx_ext_subs(flags) + 1)); @@ -442,7 +443,8 @@ cn10k_nix_prep_sec(struct rte_mbuf *m, uint64_t *cmd, uintptr_t *nixtx_addr, sa = (uintptr_t)roc_nix_inl_ot_ipsec_outb_sa(sa_base, sess_priv.sa_idx); ucode_cmd[3] = (ROC_CPT_DFLT_ENG_GRP_SE_IE << 61 | 1UL << 60 | sa); ucode_cmd[0] = (ROC_IE_OT_MAJOR_OP_PROCESS_OUTBOUND_IPSEC << 48 | - ((uint64_t)sess_priv.chksum) << 32 | pkt_len); + ((uint64_t)sess_priv.chksum) << 32 | + ((uint64_t)sess_priv.dec_ttl) << 34 | pkt_len); /* CPT Word 0 and Word 1. Assume no multi-seg support */ cmd01 = vdupq_n_u64((nixtx + 16) | (cn10k_nix_tx_ext_subs(flags) + 1)); -- 2.8.4
[PATCH v2 19/28] net/cnxk: optimize Rx fast path for security pkts
Optimize Rx fast path for security pkts by preprocessing most of the operations such as sa pointer compute, inner wqe pointer fetch and ucode completion translation before the pkt is characterized as inbound inline pkt. Preprocessed info will be discarded if pkt is not found to be security pkt. Also fix fetching of CQ word5 for vector mode. Get ucode completion code from CPT parse header and RLEN from IP4v/IPv6 decrypted packet as it is in same 64B cacheline as CPT parse header in most of the cases. By this method, we avoid accessing an extra cacheline Fixes: c062f5726f61 ("net/cnxk: support IP reassembly") Signed-off-by: Nithin Dabilpuram --- drivers/net/cnxk/cn10k_rx.h | 488 +++- 1 file changed, 306 insertions(+), 182 deletions(-) diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h index 94c1f1e..14b634e 100644 --- a/drivers/net/cnxk/cn10k_rx.h +++ b/drivers/net/cnxk/cn10k_rx.h @@ -341,6 +341,9 @@ nix_sec_reassemble_frags(const struct cpt_parse_hdr_s *hdr, uint64_t cq_w1, mbuf->data_len = frag_size; fragx_sum += frag_size; + /* Mark frag as get */ + RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1); + /* Frag-2: */ if (hdr->w0.num_frags > 2) { frag_ptr = (uint64_t *)(finfo + 1); @@ -354,6 +357,9 @@ nix_sec_reassemble_frags(const struct cpt_parse_hdr_s *hdr, uint64_t cq_w1, *(uint64_t *)(&mbuf->rearm_data) = mbuf_init | data_off; mbuf->data_len = frag_size; fragx_sum += frag_size; + + /* Mark frag as get */ + RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1); } /* Frag-3: */ @@ -368,6 +374,9 @@ nix_sec_reassemble_frags(const struct cpt_parse_hdr_s *hdr, uint64_t cq_w1, *(uint64_t *)(&mbuf->rearm_data) = mbuf_init | data_off; mbuf->data_len = frag_size; fragx_sum += frag_size; + + /* Mark frag as get */ + RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 1); } if (inner_rx->lctype == NPC_LT_LC_IP) { @@ -413,10 +422,10 @@ nix_sec_meta_to_mbuf_sc(uint64_t cq_w1, uint64_t cq_w5, const uint64_t sa_base, const struct cpt_parse_hdr_s *hdr = (const struct cpt_parse_hdr_s *)__p; struct cn10k_inb_priv_data *inb_priv; struct rte_mbuf *inner = NULL; - uint64_t res_w1; uint32_t sa_idx; - uint16_t uc_cc; + uint16_t ucc; uint32_t len; + uintptr_t ip; void *inb_sa; uint64_t w0; @@ -438,20 +447,23 @@ nix_sec_meta_to_mbuf_sc(uint64_t cq_w1, uint64_t cq_w5, const uint64_t sa_base, *rte_security_dynfield(inner) = (uint64_t)inb_priv->userdata; - /* CPT result(struct cpt_cn10k_res_s) is at -* after first IOVA in meta + /* Get ucc from cpt parse header */ + ucc = hdr->w3.hw_ccode; + + /* Calculate inner packet length as +* IP total len + l2 len */ - res_w1 = *((uint64_t *)(&inner[1]) + 10); - uc_cc = res_w1 & 0xFF; + ip = (uintptr_t)hdr + ((cq_w5 >> 16) & 0xFF); + ip += ((cq_w1 >> 40) & 0x6); + len = rte_be_to_cpu_16(*(uint16_t *)ip); + len += ((cq_w5 >> 16) & 0xFF) - (cq_w5 & 0xFF); + len += (cq_w1 & BIT(42)) ? 40 : 0; - /* Calculate inner packet length */ - len = ((res_w1 >> 16) & 0x) + hdr->w2.il3_off - - sizeof(struct cpt_parse_hdr_s) - (w0 & 0x7); inner->pkt_len = len; inner->data_len = len; *(uint64_t *)(&inner->rearm_data) = mbuf_init; - inner->ol_flags = ((uc_cc == CPT_COMP_WARN) ? + inner->ol_flags = ((ucc == CPT_COMP_WARN) ? RTE_MBUF_F_RX_SEC_OFFLOAD : (RTE_MBUF_F_RX_SEC_OFFLOAD | RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED)); @@ -477,6 +489,12 @@ nix_sec_meta_to_mbuf_sc(uint64_t cq_w1, uint64_t cq_w5, const uint64_t sa_base, *(uint64_t *)(laddr + (*loff << 3)) = (uint64_t)mbuf; *loff = *loff + 1; + /* Mark meta mbuf as put */ + RTE_MEMPOOL_CHECK_COOKIES(mbuf->pool, (void **)&mbuf, 1, 0); + + /* Mark inner mbuf as get */ + RTE_MEMPOOL_CHECK_COOKIES(inner->pool, (void **)&inner, 1, 1); + return inner; } else if (cq_w1 & BIT(11)) { inner = (struct rte_mbuf *)(rte_be_to_cpu_64(hdr->wqe_ptr) - @@
[PATCH v2 20/28] net/cnxk: update olflags with L3/L4 csum offload
From: Akhil Goyal When the packet is processed with inline IPsec offload, the ol_flags were updated only with RTE_MBUF_F_RX_SEC_OFFLOAD. But the hardware can also update the L3/L4 csum offload flags. Hence, ol_flags are updated with RTE_MBUF_F_RX_IP_CKSUM_GOOD, RTE_MBUF_F_RX_L4_CKSUM_GOOD, etc based on the microcode completion codes. Signed-off-by: Akhil Goyal Signed-off-by: Nithin Dabilpuram --- drivers/net/cnxk/cn10k_rx.h | 51 - 1 file changed, 50 insertions(+), 1 deletion(-) diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h index 14b634e..00bec01 100644 --- a/drivers/net/cnxk/cn10k_rx.h +++ b/drivers/net/cnxk/cn10k_rx.h @@ -42,6 +42,18 @@ (uint64_t *)(((uintptr_t)((uint64_t *)(b))[i]) - (o)) : \ (uint64_t *)(((uintptr_t)(b)) + CQE_SZ(i) - (o))) +#define NIX_RX_SEC_UCC_CONST \ + ((RTE_MBUF_F_RX_IP_CKSUM_BAD >> 1) << 8 | \ +((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1)\ +<< 24 | \ +((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_BAD) >> 1) \ +<< 32 | \ +((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1)\ +<< 40 | \ +((RTE_MBUF_F_RX_IP_CKSUM_GOOD | RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1)\ +<< 48 | \ +(RTE_MBUF_F_RX_IP_CKSUM_GOOD >> 1) << 56) + #ifdef RTE_LIBRTE_MEMPOOL_DEBUG static inline void nix_mbuf_validate_next(struct rte_mbuf *m) @@ -467,6 +479,11 @@ nix_sec_meta_to_mbuf_sc(uint64_t cq_w1, uint64_t cq_w5, const uint64_t sa_base, RTE_MBUF_F_RX_SEC_OFFLOAD : (RTE_MBUF_F_RX_SEC_OFFLOAD | RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED)); + + ucc = hdr->w3.uc_ccode; + inner->ol_flags |= ((ucc & 0xF0) == 0xF0) ? + ((NIX_RX_SEC_UCC_CONST >> ((ucc & 0xF) << 3)) +& 0xFF) << 1 : 0; } else if (!(hdr->w0.err_sum) && !(hdr->w0.reas_sts)) { /* Reassembly success */ inner = nix_sec_reassemble_frags(hdr, cq_w1, cq_w5, @@ -529,6 +546,11 @@ nix_sec_meta_to_mbuf_sc(uint64_t cq_w1, uint64_t cq_w5, const uint64_t sa_base, (RTE_MBUF_F_RX_SEC_OFFLOAD | RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED)); + ucc = hdr->w3.uc_ccode; + inner->ol_flags |= ((ucc & 0xF0) == 0xF0) ? + ((NIX_RX_SEC_UCC_CONST >> ((ucc & 0xF) << 3)) +& 0xFF) << 1 : 0; + /* Store meta in lmtline to free * Assume all meta's from same aura. */ @@ -1313,7 +1335,26 @@ cn10k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, sa23 = vaddq_u64(sa23, vdupq_n_u64(sa_base)); const uint8x16_t tbl = { - 0, 0, 0, 0, 0, 0, 0, 0, + /* ROC_IE_OT_UCC_SUCCESS_SA_SOFTEXP_FIRST */ + 0, + /* ROC_IE_OT_UCC_SUCCESS_PKT_IP_BADCSUM */ + RTE_MBUF_F_RX_IP_CKSUM_BAD >> 1, + /* ROC_IE_OT_UCC_SUCCESS_SA_SOFTEXP_AGAIN */ + 0, + /* ROC_IE_OT_UCC_SUCCESS_PKT_L4_GOODCSUM */ + (RTE_MBUF_F_RX_IP_CKSUM_GOOD | +RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1, + /* ROC_IE_OT_UCC_SUCCESS_PKT_L4_BADCSUM */ + (RTE_MBUF_F_RX_IP_CKSUM_GOOD | +RTE_MBUF_F_RX_L4_CKSUM_BAD) >> 1, + /* ROC_IE_OT_UCC_SUCCESS_PKT_UDPESP_NZCSUM */ + (RTE_MBUF_F_RX_IP_CKSUM_GOOD | +RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1, + /* ROC_IE_OT_UCC_SUCCESS_PKT_UDP_ZEROCSUM */ + (RTE_MBUF_F_RX_IP_CKSUM_GOOD | +RTE_MBUF_F_RX_L4_CKSUM_GOOD) >> 1, + /* ROC_IE_OT_UCC_SUCCESS_PKT_IP_GOODCSUM */ + RTE_MBUF_F_RX_IP_CKSUM_GOOD >> 1, /* HW_CCODE -> RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED */ 1, 0, 1, 1, 1, 1, 0, 1, }; @@ -1419,6 +1460,8 @@ cn10k_ni
[PATCH v2 21/28] net/cnxk: add capabilities for IPsec crypto algos
From: Akhil Goyal Added supported crypto algorithms for inline IPsec offload. Signed-off-by: Akhil Goyal --- drivers/net/cnxk/cn10k_ethdev_sec.c | 166 1 file changed, 166 insertions(+) diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c index f32e169..6a3e636 100644 --- a/drivers/net/cnxk/cn10k_ethdev_sec.c +++ b/drivers/net/cnxk/cn10k_ethdev_sec.c @@ -62,6 +62,46 @@ static struct rte_cryptodev_capabilities cn10k_eth_sec_crypto_caps[] = { }, } }, } }, + { /* AES CTR */ + .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC, + {.sym = { + .xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER, + {.cipher = { + .algo = RTE_CRYPTO_CIPHER_AES_CTR, + .block_size = 16, + .key_size = { + .min = 16, + .max = 32, + .increment = 8 + }, + .iv_size = { + .min = 12, + .max = 16, + .increment = 4 + } + }, } + }, } + }, + { /* AES-XCBC */ + .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC, + { .sym = { + .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH, + {.auth = { + .algo = RTE_CRYPTO_AUTH_AES_XCBC_MAC, + .block_size = 16, + .key_size = { + .min = 16, + .max = 16, + .increment = 0 + }, + .digest_size = { + .min = 12, + .max = 12, + .increment = 0, + }, + }, } + }, } + }, { /* SHA1 HMAC */ .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC, {.sym = { @@ -82,6 +122,132 @@ static struct rte_cryptodev_capabilities cn10k_eth_sec_crypto_caps[] = { }, } }, } }, + { /* SHA256 HMAC */ + .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC, + {.sym = { + .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH, + {.auth = { + .algo = RTE_CRYPTO_AUTH_SHA256_HMAC, + .block_size = 64, + .key_size = { + .min = 1, + .max = 1024, + .increment = 1 + }, + .digest_size = { + .min = 16, + .max = 32, + .increment = 16 + }, + }, } + }, } + }, + { /* SHA384 HMAC */ + .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC, + {.sym = { + .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH, + {.auth = { + .algo = RTE_CRYPTO_AUTH_SHA384_HMAC, + .block_size = 64, + .key_size = { + .min = 1, + .max = 1024, + .increment = 1 + }, + .digest_size = { + .min = 24, + .max = 48, + .increment = 24 + }, + }, } + }, } + }, + { /* SHA512 HMAC */ + .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC, + {.sym = { + .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH, + {.auth = { + .algo = RTE_CRYPTO_AUTH_SHA512_HMAC, + .block_size = 128, + .key_size = { + .min = 1, + .max = 1024, + .increment = 1 + }, + .digest_size = { +
[PATCH v2 22/28] net/cnxk: add capabilities for IPsec options
From: Akhil Goyal Added supported capabilities for various IPsec SA options. Signed-off-by: Akhil Goyal Signed-off-by: Vamsi Attunuru --- drivers/net/cnxk/cn10k_ethdev_sec.c | 57 ++--- 1 file changed, 53 insertions(+), 4 deletions(-) diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c index 6a3e636..7e4941d 100644 --- a/drivers/net/cnxk/cn10k_ethdev_sec.c +++ b/drivers/net/cnxk/cn10k_ethdev_sec.c @@ -259,7 +259,20 @@ static const struct rte_security_capability cn10k_eth_sec_capabilities[] = { .proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP, .mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL, .direction = RTE_SECURITY_IPSEC_SA_DIR_INGRESS, - .options = { 0 } + .replay_win_sz_max = ROC_AR_WIN_SIZE_MAX, + .options = { + .udp_encap = 1, + .udp_ports_verify = 1, + .copy_df = 1, + .copy_dscp = 1, + .copy_flabel = 1, + .tunnel_hdr_verify = RTE_SECURITY_IPSEC_TUNNEL_VERIFY_SRC_DST_ADDR, + .dec_ttl = 1, + .ip_csum_enable = 1, + .l4_csum_enable = 1, + .stats = 0, + .esn = 1, + }, }, .crypto_capabilities = cn10k_eth_sec_crypto_caps, .ol_flags = RTE_SECURITY_TX_OLOAD_NEED_MDATA @@ -271,7 +284,20 @@ static const struct rte_security_capability cn10k_eth_sec_capabilities[] = { .proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP, .mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL, .direction = RTE_SECURITY_IPSEC_SA_DIR_EGRESS, - .options = { 0 } + .replay_win_sz_max = ROC_AR_WIN_SIZE_MAX, + .options = { + .iv_gen_disable = 1, + .udp_encap = 1, + .udp_ports_verify = 1, + .copy_df = 1, + .copy_dscp = 1, + .copy_flabel = 1, + .dec_ttl = 1, + .ip_csum_enable = 1, + .l4_csum_enable = 1, + .stats = 0, + .esn = 1, + }, }, .crypto_capabilities = cn10k_eth_sec_crypto_caps, .ol_flags = RTE_SECURITY_TX_OLOAD_NEED_MDATA @@ -283,7 +309,19 @@ static const struct rte_security_capability cn10k_eth_sec_capabilities[] = { .proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP, .mode = RTE_SECURITY_IPSEC_SA_MODE_TRANSPORT, .direction = RTE_SECURITY_IPSEC_SA_DIR_EGRESS, - .options = { 0 } + .replay_win_sz_max = ROC_AR_WIN_SIZE_MAX, + .options = { + .iv_gen_disable = 1, + .udp_encap = 1, + .udp_ports_verify = 1, + .copy_df = 1, + .copy_dscp = 1, + .dec_ttl = 1, + .ip_csum_enable = 1, + .l4_csum_enable = 1, + .stats = 0, + .esn = 1, + }, }, .crypto_capabilities = cn10k_eth_sec_crypto_caps, .ol_flags = RTE_SECURITY_TX_OLOAD_NEED_MDATA @@ -295,7 +333,18 @@ static const struct rte_security_capability cn10k_eth_sec_capabilities[] = { .proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP, .mode = RTE_SECURITY_IPSEC_SA_MODE_TRANSPORT, .direction = RTE_SECURITY_IPSEC_SA_DIR_INGRESS, - .options = { 0 } + .replay_win_sz_max = ROC_AR_WIN_SIZE_MAX, + .options = { + .udp_encap = 1, + .udp_ports_verify = 1, + .copy_df = 1, + .copy_dscp = 1, + .dec_ttl = 1, + .ip_csum_enable = 1, + .l4_csum_enable = 1, + .stats = 0, + .esn = 1, + }, }, .crypto_capabilities = cn10k_eth_sec_crypto_caps, .ol_flags = RTE_SECURITY_TX_OLOAD
[PATCH v2 23/28] net/cnxk: support security stats
From: Akhil Goyal Enabled rte_security stats operation based on the configuration of SA options set while creating session. Signed-off-by: Vamsi Attunuru Signed-off-by: Akhil Goyal --- drivers/net/cnxk/cn10k_ethdev_sec.c | 56 ++--- 1 file changed, 52 insertions(+), 4 deletions(-) diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c index 7e4941d..7c4988b 100644 --- a/drivers/net/cnxk/cn10k_ethdev_sec.c +++ b/drivers/net/cnxk/cn10k_ethdev_sec.c @@ -270,7 +270,7 @@ static const struct rte_security_capability cn10k_eth_sec_capabilities[] = { .dec_ttl = 1, .ip_csum_enable = 1, .l4_csum_enable = 1, - .stats = 0, + .stats = 1, .esn = 1, }, }, @@ -295,7 +295,7 @@ static const struct rte_security_capability cn10k_eth_sec_capabilities[] = { .dec_ttl = 1, .ip_csum_enable = 1, .l4_csum_enable = 1, - .stats = 0, + .stats = 1, .esn = 1, }, }, @@ -319,7 +319,7 @@ static const struct rte_security_capability cn10k_eth_sec_capabilities[] = { .dec_ttl = 1, .ip_csum_enable = 1, .l4_csum_enable = 1, - .stats = 0, + .stats = 1, .esn = 1, }, }, @@ -342,7 +342,7 @@ static const struct rte_security_capability cn10k_eth_sec_capabilities[] = { .dec_ttl = 1, .ip_csum_enable = 1, .l4_csum_enable = 1, - .stats = 0, + .stats = 1, .esn = 1, }, }, @@ -679,6 +679,11 @@ cn10k_eth_sec_session_create(void *device, inb_sa_dptr->w1.s.cookie = rte_cpu_to_be_32(ipsec->spi & spi_mask); + if (ipsec->options.stats == 1) { + /* Enable mib counters */ + inb_sa_dptr->w0.s.count_mib_bytes = 1; + inb_sa_dptr->w0.s.count_mib_pkts = 1; + } /* Prepare session priv */ sess_priv.inb_sa = 1; sess_priv.sa_idx = ipsec->spi & spi_mask; @@ -761,6 +766,12 @@ cn10k_eth_sec_session_create(void *device, /* Save rlen info */ cnxk_ipsec_outb_rlens_get(rlens, ipsec, crypto); + if (ipsec->options.stats == 1) { + /* Enable mib counters */ + outb_sa_dptr->w0.s.count_mib_bytes = 1; + outb_sa_dptr->w0.s.count_mib_pkts = 1; + } + /* Prepare session priv */ sess_priv.sa_idx = outb_priv->sa_idx; sess_priv.roundup_byte = rlens->roundup_byte; @@ -877,6 +888,42 @@ cn10k_eth_sec_capabilities_get(void *device __rte_unused) return cn10k_eth_sec_capabilities; } +static int +cn10k_eth_sec_session_stats_get(void *device, struct rte_security_session *sess, + struct rte_security_stats *stats) +{ + struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device; + struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev); + struct cnxk_eth_sec_sess *eth_sec; + int rc; + + eth_sec = cnxk_eth_sec_sess_get_by_sess(dev, sess); + if (eth_sec == NULL) + return -EINVAL; + + rc = roc_nix_inl_sa_sync(&dev->nix, eth_sec->sa, eth_sec->inb, + ROC_NIX_INL_SA_OP_FLUSH); + if (rc) + return -EINVAL; + rte_delay_ms(1); + + stats->protocol = RTE_SECURITY_PROTOCOL_IPSEC; + + if (eth_sec->inb) { + stats->ipsec.ipackets = + ((struct roc_ot_ipsec_inb_sa *)eth_sec->sa)->ctx.mib_pkts; + stats->ipsec.ibytes = + ((struct roc_ot_ipsec_inb_sa *)eth_sec->sa)->ctx.mib_octs; + } else { + stats->ipsec.opackets = + ((struct roc_ot_ipsec_outb_sa *)eth_sec->sa)->ctx.mib_pkts; + stats->ipsec.obytes = + ((struct roc_ot_ipsec_outb_sa *)eth_sec->sa)->ctx.mib_octs; + } + + return 0; +} + void cn10k_eth_sec_ops_override(void) { @@ -890,4 +937,5 @@ cn10k_eth_sec_ops_override(void) cnxk_eth_sec_ops.session_create = cn10k_eth_sec_session_create; cnxk_eth_sec_ops.session_destroy =
[PATCH v2 24/28] net/cnxk: add support for flow control for outbound inline
Add support for flow control in outbound inline path using fc updates from CPT. Signed-off-by: Nithin Dabilpuram --- drivers/net/cnxk/cn10k_ethdev.c | 3 +++ drivers/net/cnxk/cn10k_ethdev.h | 1 + drivers/net/cnxk/cn10k_tx.h | 37 - drivers/net/cnxk/cnxk_ethdev.c | 13 + drivers/net/cnxk/cnxk_ethdev.h | 3 +++ 5 files changed, 56 insertions(+), 1 deletion(-) diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c index d04b9eb..de688f0 100644 --- a/drivers/net/cnxk/cn10k_ethdev.c +++ b/drivers/net/cnxk/cn10k_ethdev.c @@ -204,6 +204,9 @@ cn10k_nix_tx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, txq->cpt_io_addr = inl_lf->io_addr; txq->cpt_fc = inl_lf->fc_addr; + txq->cpt_fc_sw = (int32_t *)((uintptr_t)dev->outb.fc_sw_mem + +crypto_qid * RTE_CACHE_LINE_SIZE); + txq->cpt_desc = inl_lf->nb_desc * 0.7; txq->sa_base = (uint64_t)dev->outb.sa_base; txq->sa_base |= eth_dev->data->port_id; diff --git a/drivers/net/cnxk/cn10k_ethdev.h b/drivers/net/cnxk/cn10k_ethdev.h index c8666ce..acfdbb6 100644 --- a/drivers/net/cnxk/cn10k_ethdev.h +++ b/drivers/net/cnxk/cn10k_ethdev.h @@ -19,6 +19,7 @@ struct cn10k_eth_txq { uint64_t sa_base; uint64_t *cpt_fc; uint16_t cpt_desc; + int32_t *cpt_fc_sw; uint64_t lso_tun_fmt; uint64_t ts_mem; uint64_t mark_flag : 8; diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index c482352..762586f 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -209,6 +209,37 @@ cn10k_nix_tx_skeleton(struct cn10k_eth_txq *txq, uint64_t *cmd, } static __rte_always_inline void +cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, uint16_t nb_pkts) +{ + int32_t nb_desc, val, newval; + int32_t *fc_sw; + volatile uint64_t *fc; + + /* Check if there is any CPT instruction to submit */ + if (!nb_pkts) + return; + +again: + fc_sw = txq->cpt_fc_sw; + val = __atomic_sub_fetch(fc_sw, nb_pkts, __ATOMIC_RELAXED); + if (likely(val >= 0)) + return; + + nb_desc = txq->cpt_desc; + fc = txq->cpt_fc; + while (true) { + newval = nb_desc - __atomic_load_n(fc, __ATOMIC_RELAXED); + newval -= nb_pkts; + if (newval >= 0) + break; + } + + if (!__atomic_compare_exchange_n(fc_sw, &val, newval, false, +__ATOMIC_RELAXED, __ATOMIC_RELAXED)) + goto again; +} + +static __rte_always_inline void cn10k_nix_sec_steorl(uintptr_t io_addr, uint32_t lmt_id, uint8_t lnum, uint8_t loff, uint8_t shft) { @@ -995,6 +1026,7 @@ cn10k_nix_xmit_pkts(void *tx_queue, uint64_t *ws, struct rte_mbuf **tx_pkts, if (flags & NIX_TX_OFFLOAD_SECURITY_F) { /* Reduce pkts to be sent to CPT */ burst -= ((c_lnum << 1) + c_loff); + cn10k_nix_sec_fc_wait(txq, (c_lnum << 1) + c_loff); cn10k_nix_sec_steorl(c_io_addr, c_lmt_id, c_lnum, c_loff, c_shft); } @@ -1138,6 +1170,7 @@ cn10k_nix_xmit_pkts_mseg(void *tx_queue, uint64_t *ws, if (flags & NIX_TX_OFFLOAD_SECURITY_F) { /* Reduce pkts to be sent to CPT */ burst -= ((c_lnum << 1) + c_loff); + cn10k_nix_sec_fc_wait(txq, (c_lnum << 1) + c_loff); cn10k_nix_sec_steorl(c_io_addr, c_lmt_id, c_lnum, c_loff, c_shft); } @@ -2682,9 +2715,11 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, uint64_t *ws, left -= burst; /* Submit CPT instructions if any */ - if (flags & NIX_TX_OFFLOAD_SECURITY_F) + if (flags & NIX_TX_OFFLOAD_SECURITY_F) { + cn10k_nix_sec_fc_wait(txq, (c_lnum << 1) + c_loff); cn10k_nix_sec_steorl(c_io_addr, c_lmt_id, c_lnum, c_loff, c_shft); + } /* Trigger LMTST */ if (lnum > 16) { diff --git a/drivers/net/cnxk/cnxk_ethdev.c b/drivers/net/cnxk/cnxk_ethdev.c index e1b1e16..12ff30f 100644 --- a/drivers/net/cnxk/cnxk_ethdev.c +++ b/drivers/net/cnxk/cnxk_ethdev.c @@ -155,9 +155,19 @@ nix_security_setup(struct cnxk_eth_dev *dev) dev->outb.sa_base = roc_nix_inl_outb_sa_base_get(nix); dev->outb.sa_bmap_mem = mem; dev->outb.sa_bmap = bmap; + + dev->outb.fc_sw_mem = plt_zmalloc(dev->outb.nb_crypto_qs * + RTE_CACHE_LINE_SIZE, + RTE_CACHE_LINE_SIZE); + if (!dev->outb.fc_sw_mem) { + plt_err("Outbound fc sw mem alloc fail
[PATCH v2 25/28] net/cnxk: perform early MTU setup for eventmode
Perform early MTU setup for event mode path in order to update the Rx/Tx offload flags before Rx adapter setup starts. Signed-off-by: Nithin Dabilpuram --- drivers/net/cnxk/cn10k_ethdev.c | 11 +++ drivers/net/cnxk/cn9k_ethdev.c | 11 +++ 2 files changed, 22 insertions(+) diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c index de688f0..bc9e10f 100644 --- a/drivers/net/cnxk/cn10k_ethdev.c +++ b/drivers/net/cnxk/cn10k_ethdev.c @@ -248,6 +248,17 @@ cn10k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, if (rc) return rc; + /* Do initial mtu setup for RQ0 before device start */ + if (!qid) { + rc = nix_recalc_mtu(eth_dev); + if (rc) + return rc; + + /* Update offload flags */ + dev->rx_offload_flags = nix_rx_offload_flags(eth_dev); + dev->tx_offload_flags = nix_tx_offload_flags(eth_dev); + } + rq = &dev->rqs[qid]; cq = &dev->cqs[qid]; diff --git a/drivers/net/cnxk/cn9k_ethdev.c b/drivers/net/cnxk/cn9k_ethdev.c index 18cc27e..de33fa7 100644 --- a/drivers/net/cnxk/cn9k_ethdev.c +++ b/drivers/net/cnxk/cn9k_ethdev.c @@ -241,6 +241,17 @@ cn9k_nix_rx_queue_setup(struct rte_eth_dev *eth_dev, uint16_t qid, if (rc) return rc; + /* Do initial mtu setup for RQ0 before device start */ + if (!qid) { + rc = nix_recalc_mtu(eth_dev); + if (rc) + return rc; + + /* Update offload flags */ + dev->rx_offload_flags = nix_rx_offload_flags(eth_dev); + dev->tx_offload_flags = nix_tx_offload_flags(eth_dev); + } + rq = &dev->rqs[qid]; cq = &dev->cqs[qid]; -- 2.8.4
[PATCH v2 26/28] common/cnxk: allow lesser inline inbound sa sizes
Restructure SA setup to allow lesser inbound SA sizes as opposed to full Inbound SA size of 1024B with max possible Anti-Replay window. Since inbound SA size is variable, move the memset logic out of common code. Signed-off-by: Nithin Dabilpuram --- drivers/common/cnxk/roc_ie_ot.c | 4 drivers/common/cnxk/roc_nix_inl.c | 9 - drivers/common/cnxk/roc_nix_inl.h | 26 +++--- 3 files changed, 31 insertions(+), 8 deletions(-) diff --git a/drivers/common/cnxk/roc_ie_ot.c b/drivers/common/cnxk/roc_ie_ot.c index d0b7ad3..4b5823d 100644 --- a/drivers/common/cnxk/roc_ie_ot.c +++ b/drivers/common/cnxk/roc_ie_ot.c @@ -10,8 +10,6 @@ roc_ot_ipsec_inb_sa_init(struct roc_ot_ipsec_inb_sa *sa, bool is_inline) { size_t offset; - memset(sa, 0, sizeof(struct roc_ot_ipsec_inb_sa)); - if (is_inline) { sa->w0.s.pkt_output = ROC_IE_OT_SA_PKT_OUTPUT_NO_FRAG; sa->w0.s.pkt_format = ROC_IE_OT_SA_PKT_FMT_META; @@ -33,8 +31,6 @@ roc_ot_ipsec_outb_sa_init(struct roc_ot_ipsec_outb_sa *sa) { size_t offset; - memset(sa, 0, sizeof(struct roc_ot_ipsec_outb_sa)); - offset = offsetof(struct roc_ot_ipsec_outb_sa, ctx); sa->w0.s.ctx_push_size = (offset / ROC_CTX_UNIT_8B) + 1; sa->w0.s.ctx_size = ROC_IE_OT_CTX_ILEN; diff --git a/drivers/common/cnxk/roc_nix_inl.c b/drivers/common/cnxk/roc_nix_inl.c index 2c013cb..887d4ad 100644 --- a/drivers/common/cnxk/roc_nix_inl.c +++ b/drivers/common/cnxk/roc_nix_inl.c @@ -14,9 +14,16 @@ PLT_STATIC_ASSERT(ROC_NIX_INL_ONF_IPSEC_OUTB_SA_SZ == 1UL << ROC_NIX_INL_ONF_IPSEC_OUTB_SA_SZ_LOG2); PLT_STATIC_ASSERT(ROC_NIX_INL_OT_IPSEC_INB_SA_SZ == 1UL << ROC_NIX_INL_OT_IPSEC_INB_SA_SZ_LOG2); -PLT_STATIC_ASSERT(ROC_NIX_INL_OT_IPSEC_INB_SA_SZ == 1024); PLT_STATIC_ASSERT(ROC_NIX_INL_OT_IPSEC_OUTB_SA_SZ == 1UL << ROC_NIX_INL_OT_IPSEC_OUTB_SA_SZ_LOG2); +PLT_STATIC_ASSERT(ROC_NIX_INL_OT_IPSEC_INB_SA_SZ >= + ROC_NIX_INL_OT_IPSEC_INB_HW_SZ + + ROC_NIX_INL_OT_IPSEC_INB_SW_RSVD); +/* Allow lesser INB SA HW sizes */ +PLT_STATIC_ASSERT(ROC_NIX_INL_OT_IPSEC_INB_HW_SZ <= + PLT_ALIGN(sizeof(struct roc_ot_ipsec_inb_sa), ROC_ALIGN)); +PLT_STATIC_ASSERT(ROC_NIX_INL_OT_IPSEC_OUTB_HW_SZ == + PLT_ALIGN(sizeof(struct roc_ot_ipsec_outb_sa), ROC_ALIGN)); static int nix_inl_inb_sa_tbl_setup(struct roc_nix *roc_nix) diff --git a/drivers/common/cnxk/roc_nix_inl.h b/drivers/common/cnxk/roc_nix_inl.h index 633f090..e7bcffc 100644 --- a/drivers/common/cnxk/roc_nix_inl.h +++ b/drivers/common/cnxk/roc_nix_inl.h @@ -23,13 +23,33 @@ #define ROC_NIX_INL_ONF_IPSEC_OUTB_SA_SZ_LOG2 8 /* OT INB HW area */ +#ifndef ROC_NIX_INL_OT_IPSEC_AR_WIN_SZ_MAX +#define ROC_NIX_INL_OT_IPSEC_AR_WIN_SZ_MAX 4096u +#endif +#define ROC_NIX_INL_OT_IPSEC_AR_WINBITS_SZ \ + (PLT_ALIGN_CEIL(ROC_NIX_INL_OT_IPSEC_AR_WIN_SZ_MAX,\ + BITS_PER_LONG_LONG) / \ +BITS_PER_LONG_LONG) +#define __ROC_NIX_INL_OT_IPSEC_INB_HW_SZ \ + (offsetof(struct roc_ot_ipsec_inb_sa, ctx.ar_winbits) +\ +sizeof(uint64_t) * ROC_NIX_INL_OT_IPSEC_AR_WINBITS_SZ) #define ROC_NIX_INL_OT_IPSEC_INB_HW_SZ \ - PLT_ALIGN(sizeof(struct roc_ot_ipsec_inb_sa), ROC_ALIGN) + PLT_ALIGN(__ROC_NIX_INL_OT_IPSEC_INB_HW_SZ, ROC_ALIGN) /* OT INB SW reserved area */ +#ifndef ROC_NIX_INL_INB_POST_PROCESS +#define ROC_NIX_INL_INB_POST_PROCESS 1 +#endif +#if ROC_NIX_INL_INB_POST_PROCESS == 0 +#define ROC_NIX_INL_OT_IPSEC_INB_SW_RSVD 0 +#else #define ROC_NIX_INL_OT_IPSEC_INB_SW_RSVD 128 +#endif + #define ROC_NIX_INL_OT_IPSEC_INB_SA_SZ \ - (ROC_NIX_INL_OT_IPSEC_INB_HW_SZ + ROC_NIX_INL_OT_IPSEC_INB_SW_RSVD) -#define ROC_NIX_INL_OT_IPSEC_INB_SA_SZ_LOG2 10 + (1UL << (64 - __builtin_clzll(ROC_NIX_INL_OT_IPSEC_INB_HW_SZ + \ + ROC_NIX_INL_OT_IPSEC_INB_SW_RSVD - 1))) +#define ROC_NIX_INL_OT_IPSEC_INB_SA_SZ_LOG2 \ + __builtin_ctzll(ROC_NIX_INL_OT_IPSEC_INB_SA_SZ) /* OT OUTB HW area */ #define ROC_NIX_INL_OT_IPSEC_OUTB_HW_SZ \ -- 2.8.4
[PATCH v2 27/28] net/cnxk: setup variable inline inbound SA
Setup inline inbound SA assuming variable size defined at compile time. Signed-off-by: Nithin Dabilpuram --- drivers/net/cnxk/cn10k_ethdev_sec.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/net/cnxk/cn10k_ethdev_sec.c b/drivers/net/cnxk/cn10k_ethdev_sec.c index 7c4988b..65519ee 100644 --- a/drivers/net/cnxk/cn10k_ethdev_sec.c +++ b/drivers/net/cnxk/cn10k_ethdev_sec.c @@ -259,7 +259,7 @@ static const struct rte_security_capability cn10k_eth_sec_capabilities[] = { .proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP, .mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL, .direction = RTE_SECURITY_IPSEC_SA_DIR_INGRESS, - .replay_win_sz_max = ROC_AR_WIN_SIZE_MAX, + .replay_win_sz_max = ROC_NIX_INL_OT_IPSEC_AR_WIN_SZ_MAX, .options = { .udp_encap = 1, .udp_ports_verify = 1, @@ -284,7 +284,7 @@ static const struct rte_security_capability cn10k_eth_sec_capabilities[] = { .proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP, .mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL, .direction = RTE_SECURITY_IPSEC_SA_DIR_EGRESS, - .replay_win_sz_max = ROC_AR_WIN_SIZE_MAX, + .replay_win_sz_max = ROC_NIX_INL_OT_IPSEC_AR_WIN_SZ_MAX, .options = { .iv_gen_disable = 1, .udp_encap = 1, @@ -309,7 +309,7 @@ static const struct rte_security_capability cn10k_eth_sec_capabilities[] = { .proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP, .mode = RTE_SECURITY_IPSEC_SA_MODE_TRANSPORT, .direction = RTE_SECURITY_IPSEC_SA_DIR_EGRESS, - .replay_win_sz_max = ROC_AR_WIN_SIZE_MAX, + .replay_win_sz_max = ROC_NIX_INL_OT_IPSEC_AR_WIN_SZ_MAX, .options = { .iv_gen_disable = 1, .udp_encap = 1, @@ -333,7 +333,7 @@ static const struct rte_security_capability cn10k_eth_sec_capabilities[] = { .proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP, .mode = RTE_SECURITY_IPSEC_SA_MODE_TRANSPORT, .direction = RTE_SECURITY_IPSEC_SA_DIR_INGRESS, - .replay_win_sz_max = ROC_AR_WIN_SIZE_MAX, + .replay_win_sz_max = ROC_NIX_INL_OT_IPSEC_AR_WIN_SZ_MAX, .options = { .udp_encap = 1, .udp_ports_verify = 1, @@ -658,7 +658,7 @@ cn10k_eth_sec_session_create(void *device, } inb_sa_dptr = (struct roc_ot_ipsec_inb_sa *)dev->inb.sa_dptr; - memset(inb_sa_dptr, 0, sizeof(struct roc_ot_ipsec_inb_sa)); + memset(inb_sa_dptr, 0, ROC_NIX_INL_OT_IPSEC_INB_HW_SZ); /* Fill inbound sa params */ rc = cnxk_ot_ipsec_inb_sa_fill(inb_sa_dptr, ipsec, crypto, @@ -701,7 +701,7 @@ cn10k_eth_sec_session_create(void *device, /* Sync session in context cache */ rc = roc_nix_inl_ctx_write(&dev->nix, inb_sa_dptr, eth_sec->sa, eth_sec->inb, - sizeof(struct roc_ot_ipsec_inb_sa)); + ROC_NIX_INL_OT_IPSEC_INB_HW_SZ); if (rc) goto mempool_put; @@ -731,7 +731,7 @@ cn10k_eth_sec_session_create(void *device, rlens = &outb_priv->rlens; outb_sa_dptr = (struct roc_ot_ipsec_outb_sa *)dev->outb.sa_dptr; - memset(outb_sa_dptr, 0, sizeof(struct roc_ot_ipsec_outb_sa)); + memset(outb_sa_dptr, 0, ROC_NIX_INL_OT_IPSEC_OUTB_HW_SZ); /* Fill outbound sa params */ rc = cnxk_ot_ipsec_outb_sa_fill(outb_sa_dptr, ipsec, crypto); @@ -795,7 +795,7 @@ cn10k_eth_sec_session_create(void *device, /* Sync session in context cache */ rc = roc_nix_inl_ctx_write(&dev->nix, outb_sa_dptr, eth_sec->sa, eth_sec->inb, - sizeof(struct roc_ot_ipsec_outb_sa)); + ROC_NIX_INL_OT_IPSEC_OUTB_HW_SZ); if (rc) goto mempool_put; } @@ -846,21 +846,23 @@ cn10k_eth_sec_session_destroy(void *device, struct rte_security_session *sess) if (eth_sec->inb) { /* Disable SA */ sa_dptr = dev->inb.sa_dptr; + memset(sa_dptr, 0, ROC_NIX_INL_OT_IPSEC_INB_HW_SZ); roc_ot_ipsec_inb_sa_init(sa_dptr, true);
[PATCH v2 28/28] net/cnxk: fix multi-seg extraction in vwqe path
Fix multi-seg extraction in vwqe path to avoid updating mbuf[] array until it is used via cq0 path. Fixes: 7fbbc981d54f ("event/cnxk: support vectorized Rx event fast path") Cc: pbhagavat...@marvell.com Cc: sta...@dpdk.org Signed-off-by: Nithin Dabilpuram --- drivers/net/cnxk/cn10k_rx.h | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h index 00bec01..5ecb20f 100644 --- a/drivers/net/cnxk/cn10k_rx.h +++ b/drivers/net/cnxk/cn10k_rx.h @@ -1673,10 +1673,6 @@ cn10k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, vst1q_u64((uint64_t *)mbuf2->rearm_data, rearm2); vst1q_u64((uint64_t *)mbuf3->rearm_data, rearm3); - /* Store the mbufs to rx_pkts */ - vst1q_u64((uint64_t *)&mbufs[packets], mbuf01); - vst1q_u64((uint64_t *)&mbufs[packets + 2], mbuf23); - if (flags & NIX_RX_MULTI_SEG_F) { /* Multi segment is enable build mseg list for * individual mbufs in scalar mode. @@ -1695,6 +1691,10 @@ cn10k_nix_recv_pkts_vector(void *args, struct rte_mbuf **mbufs, uint16_t pkts, mbuf3, mbuf_initializer, flags); } + /* Store the mbufs to rx_pkts */ + vst1q_u64((uint64_t *)&mbufs[packets], mbuf01); + vst1q_u64((uint64_t *)&mbufs[packets + 2], mbuf23); + /* Mark mempool obj as "get" as it is alloc'ed by NIX */ RTE_MEMPOOL_CHECK_COOKIES(mbuf0->pool, (void **)&mbuf0, 1, 1); RTE_MEMPOOL_CHECK_COOKIES(mbuf1->pool, (void **)&mbuf1, 1, 1); -- 2.8.4
RE: [PATCH v2 28/28] net/cnxk: fix multi-seg extraction in vwqe path
> -Original Message- > From: Nithin Dabilpuram > Sent: Friday, April 22, 2022 4:17 PM > To: Jerin Jacob Kollanukkaran ; Nithin Kumar > Dabilpuram ; Kiran Kumar Kokkilagadda > ; Sunil Kumar Kori ; Satha > Koteswara Rao Kottidi > Cc: dev@dpdk.org; Pavan Nikhilesh Bhagavatula > ; sta...@dpdk.org > Subject: [PATCH v2 28/28] net/cnxk: fix multi-seg extraction in vwqe path > > Fix multi-seg extraction in vwqe path to avoid updating mbuf[] > array until it is used via cq0 path. > > Fixes: 7fbbc981d54f ("event/cnxk: support vectorized Rx event fast path") > Cc: pbhagavat...@marvell.com > Cc: sta...@dpdk.org > > Signed-off-by: Nithin Dabilpuram Acked-by: Pavan Nikhilesh > --- > drivers/net/cnxk/cn10k_rx.h | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h > index 00bec01..5ecb20f 100644 > --- a/drivers/net/cnxk/cn10k_rx.h > +++ b/drivers/net/cnxk/cn10k_rx.h > @@ -1673,10 +1673,6 @@ cn10k_nix_recv_pkts_vector(void *args, struct > rte_mbuf **mbufs, uint16_t pkts, > vst1q_u64((uint64_t *)mbuf2->rearm_data, rearm2); > vst1q_u64((uint64_t *)mbuf3->rearm_data, rearm3); > > - /* Store the mbufs to rx_pkts */ > - vst1q_u64((uint64_t *)&mbufs[packets], mbuf01); > - vst1q_u64((uint64_t *)&mbufs[packets + 2], mbuf23); > - > if (flags & NIX_RX_MULTI_SEG_F) { > /* Multi segment is enable build mseg list for >* individual mbufs in scalar mode. > @@ -1695,6 +1691,10 @@ cn10k_nix_recv_pkts_vector(void *args, struct > rte_mbuf **mbufs, uint16_t pkts, > mbuf3, mbuf_initializer, flags); > } > > + /* Store the mbufs to rx_pkts */ > + vst1q_u64((uint64_t *)&mbufs[packets], mbuf01); > + vst1q_u64((uint64_t *)&mbufs[packets + 2], mbuf23); > + > /* Mark mempool obj as "get" as it is alloc'ed by NIX */ > RTE_MEMPOOL_CHECK_COOKIES(mbuf0->pool, (void > **)&mbuf0, 1, 1); > RTE_MEMPOOL_CHECK_COOKIES(mbuf1->pool, (void > **)&mbuf1, 1, 1); > -- > 2.8.4
Re: [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling
We (at RH) have some issues with our email infrastructure, so I can't reply inline of the patch. Copy/pasting the code: +static __rte_always_inline uint16_t +async_poll_dequeue_completed_split(struct virtio_net *dev, uint16_t queue_id, + struct rte_mbuf **pkts, uint16_t count, uint16_t dma_id, + uint16_t vchan_id, bool legacy_ol_flags) +{ + uint16_t start_idx, from, i; + uint16_t nr_cpl_pkts = 0; + struct async_inflight_info *pkts_info; + struct vhost_virtqueue *vq = dev->virtqueue[queue_id]; + Please, don't pass queue_id as an input parameter for async_poll_dequeue_completed_split(). The caller of this helper already dereferenced the vq. You can pass vq. -- David Marchand
Re: Question about flow_type_rss_offloads in rte_eth_dev_info
Hi, On 4/22/22 06:55, lihuisong (C) wrote: Hi, all. The RTE_ETH_FLOW_XXX macros, are used to display supported flow types for PMD based on the rte_eth_dev_info.flow_type_rss_offloads in the port_infos_display() of testpmd. That's true and it is wrong in testpmd. RTE_ETH_RSS_* and RTE_ETH_FLOW_* are intentionally disconnected right now. flow_type_rss_offloads docmentation is a bit misleading saying that "the bit offset also means flow type". For example, RTE_ETH_RSS_L4_CHKSUM and RTE_ETH_RSS_L3_SRC_ONLY hardly mean flow type. I think the documentation must be fixed - it should just refer to RTE_ETH_RSS_* defines. So, returning to testpmd, "Supported RSS offload flow types" code should be reworked to avoid RTE_ETH_FLOW_* usage. flowtype_to_str() should be kept intact since it is used for FDIR commands which operate with flows, not RSS bit-field. A new function should be implemented which uses maps RTE_ETH_RSS_* bits into string to be printed. And PMD assigns RSS offload capability bit, like RTE_ETH_RSS_XXX, to this field. The usage of RTE_ETH_RSS_XXX macros are described as follows in: /* * Below macros are defined for RSS offload types, they can be used to * fill rte_eth_rss_conf.rss_hf or rte_flow_action_rss.types. */ #define RTE_ETH_RSS_IPV4 RTE_BIT64(2) But RTE_ETH_FLOW_MAX is 24, and the number of RTE_ETH_FLOW_XXX micro far less than the number of RTE_ETH_RSS_XXX. If PMD sets RSS offload capability bit out range of RTE_ETH_FLOW_XXX, like RTE_ETH_RSS_L3_SRC_ONLY, to this field, testpmd will display "user defined 63" when run 'show port info 0'. This is a problem that I have now. On the other hand, rx_adv_conf.rte_eth_rss_conf.rss_hf from App must be within the rte_eth_dev_info.flow_type_rss_offloads in dev_configure. To sum up, I'm a little confused right now. How should PMD populate the field "flow_type_rss_offloads" in struct rte_eth_dev_info? flow_type_rss_offloads should be populated in terms of RTE_ETH_RSS_* bits. Andrew.
RE: [PATCH v6 0/3] Enable queue rate limit and quanta size configuration
> -Original Message- > From: Wu, Wenjun1 > Sent: Friday, April 22, 2022 9:43 AM > To: dev@dpdk.org; Wu, Jingjing ; Xing, Beilei > ; Zhang, Qi Z > Subject: [PATCH v6 0/3] Enable queue rate limit and quanta size configuration > > This patch set adds queue rate limit and quanta size configuration. > Quanta size can be changed by driver devarg quanta_size=xxx. Quanta size > should be set to the value between 256 and 4096 and be the product of 64. > > v2: Rework virtchnl. > v3: Add release note. > v4: Quanta size configuration will block device init > if PF does not support. Fix this issue. > v5: Update driver guide. > v6: Merge the release note with the previous patch. > > Wenjun Wu (3): > common/iavf: support queue rate limit and quanta size configuration > net/iavf: support queue rate limit configuration > net/iavf: support quanta size configuration > > doc/guides/nics/intel_vf.rst | 4 + > doc/guides/rel_notes/release_22_07.rst | 4 + > drivers/common/iavf/virtchnl.h | 50 +++ > drivers/net/iavf/iavf.h| 16 +++ > drivers/net/iavf/iavf_ethdev.c | 38 + > drivers/net/iavf/iavf_tm.c | 190 +++-- > drivers/net/iavf/iavf_vchnl.c | 54 +++ > 7 files changed, 348 insertions(+), 8 deletions(-) > > -- > 2.25.1 Re-applied to dpdk-next-net-intel. Thanks Qi
RE: [RFC] eal: add bus cleanup to eal cleanup
> From: Kevin Laatz [mailto:kevin.la...@intel.com] > Sent: Friday, 22 April 2022 11.18 > > On 20/04/2022 07:55, Morten Brørup wrote: > >> From: Kevin Laatz [mailto:kevin.la...@intel.com] > >> Sent: Tuesday, 19 April 2022 18.15 > >> > >> During EAL init, all buses are probed and the devices found are > >> initialized. On eal_cleanup(), the inverse does not happen, meaning > any > >> allocated memory and other configuration will not be cleaned up > >> appropriately on exit. > >> > >> Currently, in order for device cleanup to take place, applications > must > >> call the driver-relevant functions to ensure proper cleanup is done > >> before > >> the application exits. Since initialization occurs for all devices > on > >> the > >> bus, not just the devices used by an application, it requires a) > >> application awareness of all bus devices that could have been probed > on > >> the > >> system, and b) code duplication across applications to ensure > cleanup > >> is > >> performed. An example of this is rte_eth_dev_close() which is > commonly > >> used > >> across the example applications. > >> > >> This RFC proposes adding bus cleanup to the eal_cleanup() to make > EAL's > >> init/exit more symmetrical, ensuring all bus devices are cleaned up > >> appropriately without the application needing to be aware of all bus > >> types > >> that may have been probed during initialization. > >> > >> Contained in this RFC are the changes required to perform cleanup > for > >> devices on the PCI bus during eal_cleanup(). This can be expanded in > >> subsequent versions if these changes are desired. There would be an > ask > >> for > >> bus maintainers to add the relevant cleanup for their buses since > they > >> have > >> the domain expertise. > >> > >> Signed-off-by: Kevin Laatz > >> --- > > [...] > > > >> + RTE_LOG(INFO, EAL, > >> + "Clean up PCI driver: %s (%x:%x) device: > >> "PCI_PRI_FMT" (socket %i)\n", > >> + drv->driver.name, dev->id.vendor_id, dev- > >>> id.device_id, > >> + loc->domain, loc->bus, loc->devid, loc- > >>> function, > >> + dev->device.numa_node); > > I agree with Stephen, this message might as well be DEBUG level. You > could argue for symmetry: If the "alloc" message during startup is INFO > level, it makes sense using INFO level for the "free" message during > cleanup too. However, the message probably has far lower information > value during cleanup (because this driver cleanup is expected to > happen), so I would degrade it to DEBUG level. Symmetry is not always > the strongest argument. I have no strong preference, so I'll leave it > up to you, Kevin. > > Thanks for the feedback. > > +1, will change to debug for v2. > > > > > > [...] > > > >> @@ -263,6 +275,7 @@ struct rte_bus { > >>const char *name;/**< Name of the bus */ > >>rte_bus_scan_t scan; /**< Scan for devices attached to > >> bus */ > >>rte_bus_probe_t probe; /**< Probe devices on bus */ > >> + rte_bus_cleanup_t cleanup; /**< Cleanup devices on bus */ > >>rte_bus_find_device_t find_device; /**< Find a device on the bus > >> */ > >>rte_bus_plug_t plug; /**< Probe single device for drivers > >> */ > >>rte_bus_unplug_t unplug; /**< Remove single device from > >> driver */ > > Have you considered if modifying the rte_bus structure in > /lib/eal/include/rte_bus.h breaks the ABI or not? > > I've looked into this and have run test-meson-builds with ABI checks > enabled. > > The output of those checks flagged some potential breaks, however I > believe these are false positives. The output indicated 2 potential > breaks (in multiple places, but the root is the same) > > 1. Member has been added to the rte_bus struct. This is flagged as a > sub-type change, however since rte_bus is only ever reference by > pointer, it is not a break. > > 2. Offset of members changes in 'rte_pci_bus' and 'rte_vmbus_bus' > structs. These structs are only used internally so also do no break > ABI. > Sounds good! Then there should be no more worries. :-) > > Since the ABI checks do flag the addition, I will add an entry to the > abignore for the v2. > > > > > > > > Overall, this patch is certainly a good idea! > > > > On the condition that modifying the rte_bus structure does not break > the ABI... > > > > Acked-by: Morten Brørup > >
Re: [PATCH 2/5] vhost: add per-virtqueue statistics support
Hi Chenbo, On 4/21/22 16:09, Xia, Chenbo wrote: Hi Maxime, -Original Message- From: Maxime Coquelin Sent: Thursday, January 27, 2022 10:57 PM To: dev@dpdk.org; Xia, Chenbo ; david.march...@redhat.com Cc: Maxime Coquelin Subject: [PATCH 2/5] vhost: add per-virtqueue statistics support This patch introduces new APIs for the application to query and reset per-virtqueue statistics. The patch also introduces generic counters. Signed-off-by: Maxime Coquelin --- lib/vhost/rte_vhost.h | 89 + lib/vhost/socket.c | 4 +- lib/vhost/version.map | 5 ++ lib/vhost/vhost.c | 109 - lib/vhost/vhost.h | 18 ++- lib/vhost/virtio_net.c | 53 6 files changed, 274 insertions(+), 4 deletions(-) diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h index b454c05868..e739091ca0 100644 --- a/lib/vhost/rte_vhost.h +++ b/lib/vhost/rte_vhost.h @@ -37,6 +37,7 @@ extern "C" { #define RTE_VHOST_USER_LINEARBUF_SUPPORT (1ULL << 6) #define RTE_VHOST_USER_ASYNC_COPY (1ULL << 7) #define RTE_VHOST_USER_NET_COMPLIANT_OL_FLAGS (1ULL << 8) +#define RTE_VHOST_USER_NET_STATS_ENABLE(1ULL << 9) /* Features. */ #ifndef VIRTIO_NET_F_GUEST_ANNOUNCE @@ -317,6 +318,32 @@ struct rte_vhost_power_monitor_cond { uint8_t match; }; +/** Maximum name length for the statistics counters */ +#define RTE_VHOST_STATS_NAME_SIZE 64 + +/** + * Vhost virtqueue statistics structure + * + * This structure is used by rte_vhost_vring_stats_get() to provide + * virtqueue statistics to the calling application. + * It maps a name ID, corresponding to an index in the array returned + * by rte_vhost_vring_stats_get_names(), to a statistic value. + */ +struct rte_vhost_stat { + uint64_t id;/**< The index in xstats name array. */ + uint64_t value; /**< The statistic counter value. */ +}; + +/** + * Vhost virtqueue statistic name element + * + * This structure is used by rte_vhost_vring_stats_get_anmes() to Anmes -> names + * provide virtqueue statistics names to the calling application. + */ +struct rte_vhost_stat_name { + char name[RTE_VHOST_STATS_NAME_SIZE]; /**< The statistic name. */ Should we consider using ethdev one? Since vhost lib already depends on ethdev lib. I initially thought about it, but this is not a good idea IMHO, as it might confuse the user, which would think it could use the ethdev API to control it. +}; + /** * Convert guest physical address to host virtual address * @@ -1059,6 +1086,68 @@ __rte_experimental int rte_vhost_slave_config_change(int vid, bool need_reply); +/** + * Retrieve names of statistics of a Vhost virtqueue. + * + * There is an assumption that 'stat_names' and 'stats' arrays are matched + * by array index: stats_names[i].name => stats[i].value + * + * @param vid + * vhost device ID + * @param queue_id + * vhost queue index + * @param stats_names + * array of at least size elements to be filled. + * If set to NULL, the function returns the required number of elements. + * @param size + * The number of elements in stats_names array. + * @return + * A negative value on error, otherwise the number of entries filled in the + * stats name array. + */ +__rte_experimental +int +rte_vhost_vring_stats_get_names(int vid, uint16_t queue_id, + struct rte_vhost_stat_name *name, unsigned int size); '@param stats_names' and 'struct rte_vhost_stat_name *name' do not align and reports error: http://mails.dpdk.org/archives/test-report/2022-March/270275.html Ha yes, it slept through the cracks when I reworked the series, thanks for the heads-up. + +/** + * Retrieve statistics of a Vhost virtqueue. + * + * There is an assumption that 'stat_names' and 'stats' arrays are matched + * by array index: stats_names[i].name => stats[i].value + * + * @param vid + * vhost device ID + * @param queue_id + * vhost queue index + * @param stats + * A pointer to a table of structure of type rte_vhost_stat to be filled with + * virtqueue statistics ids and values. + * @param n + * The number of elements in stats array. + * @return + * A negative value on error, otherwise the number of entries filled in the + * stats table. + */ +__rte_experimental +int +rte_vhost_vring_stats_get(int vid, uint16_t queue_id, + struct rte_vhost_stat *stats, unsigned int n); + +/** + * Reset statistics of a Vhost virtqueue. + * + * @param vid + * vhost device ID + * @param queue_id + * vhost queue index + * @return + * 0 on success, a negative value on error. + */ +__rte_experimental +int +rte_vhost_vring_stats_reset(int vid, uint16_t queue_id); + #ifdef __cplusplus } #endif diff --git a/lib/vhost/socket.c b/lib/vhost/socket.c index c2f8013cd5..6020565fb6 100644 --- a/lib/vhost/socket.c +++ b/lib/vhost/socket.c @@ -43,6 +43,7 @@ struct vhost_user_socket { bool linearbuf; bool
[PATCH V2 1/3] table: improve learner table timers
Previously, on lookup hit, the hit key had its timer automatically rearmed with the same timeout in order to prevent its expiration. Now, a broader set of actions is available on lookup hit, which has to be managed explicitly: the key can have its timer rearmed with the same or with a different timeout, or the key timer can be left unmodified. The latter option allows the key to expire naturally when the timer eventually runs out, unless the key is hit again and its timer rearmed at that point. Needed by the TCP connection tracking state machine. Signed-off-by: Cristian Dumitrescu --- Depends-on: series-22386 ("[V5,1/6] port: support packet mirroring") Depends-on: patch-22480 ("[V4] pipeline: support default action arguments") lib/pipeline/rte_swx_pipeline.c | 3 +- lib/pipeline/rte_swx_pipeline_internal.h | 3 +- lib/table/rte_swx_table_learner.c| 110 --- lib/table/rte_swx_table_learner.h| 90 +-- lib/table/version.map| 4 + 5 files changed, 190 insertions(+), 20 deletions(-) diff --git a/lib/pipeline/rte_swx_pipeline.c b/lib/pipeline/rte_swx_pipeline.c index dfbac929c7..17be31d5a4 100644 --- a/lib/pipeline/rte_swx_pipeline.c +++ b/lib/pipeline/rte_swx_pipeline.c @@ -8788,7 +8788,8 @@ learner_params_get(struct learner *l) params->n_keys_max = l->size; /* Timeout. */ - params->key_timeout = l->timeout; + params->key_timeout[0] = l->timeout; + params->n_key_timeouts = 1; return params; diff --git a/lib/pipeline/rte_swx_pipeline_internal.h b/lib/pipeline/rte_swx_pipeline_internal.h index 381a35c6e0..51bb464f5f 100644 --- a/lib/pipeline/rte_swx_pipeline_internal.h +++ b/lib/pipeline/rte_swx_pipeline_internal.h @@ -2215,7 +2215,8 @@ __instr_learn_exec(struct rte_swx_pipeline *p, l->mailbox, t->time, action_id, - &t->metadata[mf_offset]); + &t->metadata[mf_offset], + 0); TRACE("[Thread %2u] learner %u learn %s\n", p->thread_id, diff --git a/lib/table/rte_swx_table_learner.c b/lib/table/rte_swx_table_learner.c index 15576c2aa3..3c98b8ce81 100644 --- a/lib/table/rte_swx_table_learner.c +++ b/lib/table/rte_swx_table_learner.c @@ -230,12 +230,16 @@ table_keycmp(void *a, void *b, void *b_mask, uint32_t n_bytes) #define TABLE_KEYS_PER_BUCKET 4 +#define TABLE_BUCKET_USEFUL_SIZE \ + (TABLE_KEYS_PER_BUCKET * (sizeof(uint32_t) + sizeof(uint32_t) + sizeof(uint8_t))) + #define TABLE_BUCKET_PAD_SIZE \ - (RTE_CACHE_LINE_SIZE - TABLE_KEYS_PER_BUCKET * (sizeof(uint32_t) + sizeof(uint32_t))) + (RTE_CACHE_LINE_SIZE - TABLE_BUCKET_USEFUL_SIZE) struct table_bucket { uint32_t time[TABLE_KEYS_PER_BUCKET]; uint32_t sig[TABLE_KEYS_PER_BUCKET]; + uint8_t key_timeout_id[TABLE_KEYS_PER_BUCKET]; uint8_t pad[TABLE_BUCKET_PAD_SIZE]; uint8_t key[0]; }; @@ -284,8 +288,11 @@ struct table_params { /* log2(bucket_size). Purpose: avoid multiplication with non-power of 2 numbers. */ size_t bucket_size_log2; - /* Timeout in CPU clock cycles. */ - uint64_t key_timeout; + /* Set of all possible key timeout values measured in CPU clock cycles. */ + uint64_t key_timeout[RTE_SWX_TABLE_LEARNER_N_KEY_TIMEOUTS_MAX]; + + /* Number of key timeout values. */ + uint32_t n_key_timeouts; /* Total memory size. */ size_t total_size; @@ -305,15 +312,23 @@ struct table { static int table_params_get(struct table_params *p, struct rte_swx_table_learner_params *params) { + uint32_t i; + /* Check input parameters. */ if (!params || !params->key_size || (params->key_size > 64) || !params->n_keys_max || (params->n_keys_max > 1U << 31) || - !params->key_timeout) + !params->key_timeout || + !params->n_key_timeouts || + (params->n_key_timeouts > RTE_SWX_TABLE_LEARNER_N_KEY_TIMEOUTS_MAX)) return -EINVAL; + for (i = 0; i < params->n_key_timeouts; i++) + if (!params->key_timeout[i]) + return -EINVAL; + /* Key. */ p->key_size = params->key_size; @@ -346,7 +361,17 @@ table_params_get(struct table_params *p, struct rte_swx_table_learner_params *pa p->bucket_size_log2 = __builtin_ctzll(p->bucket_size); /* Timeout. */ - p->key_timeout = params->key_timeout * rte_get_tsc_hz(); + for (i = 0; i < params->n_key_timeouts; i++) { + p->key_timeout[i] = params->key_timeout[i] * rte_get_tsc_hz(); + + if (!(p->key_timeout[i] >> 32)) + p->key_timeout[i] = 1LLU << 32; +
[PATCH V2 3/3] examples/pipeline: improve learner table timers
Added the rearm counter to the statistics. Updated the learner table example to the new learner table timer operation. Signed-off-by: Cristian Dumitrescu --- examples/pipeline/cli.c | 2 ++ examples/pipeline/examples/learner.spec | 15 +-- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/examples/pipeline/cli.c b/examples/pipeline/cli.c index d52ad6b61e..0334616bd9 100644 --- a/examples/pipeline/cli.c +++ b/examples/pipeline/cli.c @@ -2677,12 +2677,14 @@ cmd_pipeline_stats(char **tokens, "\t\tMiss (packets): %" PRIu64 "\n" "\t\tLearn OK (packets): %" PRIu64 "\n" "\t\tLearn error (packets): %" PRIu64 "\n" + "\t\tRearm (packets): %" PRIu64 "\n" "\t\tForget (packets): %" PRIu64 "\n", learner_info.name, stats.n_pkts_hit, stats.n_pkts_miss, stats.n_pkts_learn_ok, stats.n_pkts_learn_err, + stats.n_pkts_rearm, stats.n_pkts_forget); out_size -= strlen(out); out += strlen(out); diff --git a/examples/pipeline/examples/learner.spec b/examples/pipeline/examples/learner.spec index 4ee52da7ac..095325c293 100644 --- a/examples/pipeline/examples/learner.spec +++ b/examples/pipeline/examples/learner.spec @@ -48,6 +48,9 @@ struct metadata_t { bit<32> port_in bit<32> port_out + // Key timeout. + bit<32> timeout_id + // Arguments for the "fwd_action" action. bit<32> fwd_action_arg_port_out } @@ -68,10 +71,14 @@ struct fwd_action_args_t { action fwd_action args instanceof fwd_action_args_t { mov m.port_out t.port_out + rearm return } action learn_action args none { + // Pick the key timeout. Timeout ID #1 (i.e. 120 seconds) is selected. + mov m.timeout_id 1 + // Read current counter value into m.fwd_action_arg_port_out. regrd m.fwd_action_arg_port_out counter 0 @@ -84,7 +91,7 @@ action learn_action args none { // Add the current lookup key to the table with fwd_action as the key action. The action // arguments are read from the packet meta-data (the m.fwd_action_arg_port_out field). These // packet meta-data fields have to be written before the "learn" instruction is invoked. - learn fwd_action m.fwd_action_arg_port_out + learn fwd_action m.fwd_action_arg_port_out m.timeout_id // Send the current packet to the same output port. mov m.port_out m.fwd_action_arg_port_out @@ -110,7 +117,11 @@ learner fwd_table { size 1048576 - timeout 120 + timeout { + 60 + 120 + 180 + } } // -- 2.17.1
[PATCH V2 2/3] pipeline: improve learner table timers
Enable the pipeline to use the improved learner table timer operation through the new "rearm" instruction. Signed-off-by: Cristian Dumitrescu --- lib/pipeline/rte_swx_ctl.h | 3 + lib/pipeline/rte_swx_pipeline.c | 166 --- lib/pipeline/rte_swx_pipeline.h | 7 +- lib/pipeline/rte_swx_pipeline_internal.h | 70 +- lib/pipeline/rte_swx_pipeline_spec.c | 146 5 files changed, 337 insertions(+), 55 deletions(-) diff --git a/lib/pipeline/rte_swx_ctl.h b/lib/pipeline/rte_swx_ctl.h index 204026dc0e..e4cdc840fc 100644 --- a/lib/pipeline/rte_swx_ctl.h +++ b/lib/pipeline/rte_swx_ctl.h @@ -629,6 +629,9 @@ struct rte_swx_learner_stats { /** Number of packets with learning error. */ uint64_t n_pkts_learn_err; + /** Number of packets with rearm event. */ + uint64_t n_pkts_rearm; + /** Number of packets with forget event. */ uint64_t n_pkts_forget; diff --git a/lib/pipeline/rte_swx_pipeline.c b/lib/pipeline/rte_swx_pipeline.c index 17be31d5a4..84d2c24311 100644 --- a/lib/pipeline/rte_swx_pipeline.c +++ b/lib/pipeline/rte_swx_pipeline.c @@ -2556,7 +2556,7 @@ instr_learner_af_exec(struct rte_swx_pipeline *p) stats->n_pkts_action[action_id] = n_pkts_action + 1; /* Thread. */ - thread_ip_action_call(p, t, action_id); + thread_ip_inc(p); /* Action */ action_func(p); @@ -2583,31 +2583,38 @@ instr_learn_translate(struct rte_swx_pipeline *p, struct instruction_data *data __rte_unused) { struct action *a; - const char *mf_name; - uint32_t mf_offset = 0; + struct field *mf_first_arg = NULL, *mf_timeout_id = NULL; + const char *mf_first_arg_name, *mf_timeout_id_name; CHECK(action, EINVAL); - CHECK((n_tokens == 2) || (n_tokens == 3), EINVAL); + CHECK((n_tokens == 3) || (n_tokens == 4), EINVAL); + /* Action. */ a = action_find(p, tokens[1]); CHECK(a, EINVAL); CHECK(!action_has_nbo_args(a), EINVAL); - mf_name = (n_tokens > 2) ? tokens[2] : NULL; - CHECK(!learner_action_args_check(p, a, mf_name), EINVAL); - - if (mf_name) { - struct field *mf; - - mf = metadata_field_parse(p, mf_name); - CHECK(mf, EINVAL); + /* Action first argument. */ + mf_first_arg_name = (n_tokens == 4) ? tokens[2] : NULL; + CHECK(!learner_action_args_check(p, a, mf_first_arg_name), EINVAL); - mf_offset = mf->offset / 8; + if (mf_first_arg_name) { + mf_first_arg = metadata_field_parse(p, mf_first_arg_name); + CHECK(mf_first_arg, EINVAL); } + /* Timeout ID. */ + mf_timeout_id_name = (n_tokens == 4) ? tokens[3] : tokens[2]; + CHECK_NAME(mf_timeout_id_name, EINVAL); + mf_timeout_id = metadata_field_parse(p, mf_timeout_id_name); + CHECK(mf_timeout_id, EINVAL); + + /* Instruction. */ instr->type = INSTR_LEARNER_LEARN; instr->learn.action_id = a->id; - instr->learn.mf_offset = mf_offset; + instr->learn.mf_first_arg_offset = mf_first_arg ? (mf_first_arg->offset / 8) : 0; + instr->learn.mf_timeout_id_offset = mf_timeout_id->offset / 8; + instr->learn.mf_timeout_id_n_bits = mf_timeout_id->n_bits; return 0; } @@ -2624,6 +2631,66 @@ instr_learn_exec(struct rte_swx_pipeline *p) thread_ip_inc(p); } +/* + * rearm. + */ +static int +instr_rearm_translate(struct rte_swx_pipeline *p, + struct action *action, + char **tokens, + int n_tokens, + struct instruction *instr, + struct instruction_data *data __rte_unused) +{ + struct field *mf_timeout_id; + const char *mf_timeout_id_name; + + CHECK(action, EINVAL); + CHECK((n_tokens == 1) || (n_tokens == 2), EINVAL); + + /* INSTR_LEARNER_REARM. */ + if (n_tokens == 1) { + instr->type = INSTR_LEARNER_REARM; + return 0; + } + + /* INSTR_LEARNER_REARM_NEW. */ + mf_timeout_id_name = tokens[1]; + CHECK_NAME(mf_timeout_id_name, EINVAL); + mf_timeout_id = metadata_field_parse(p, mf_timeout_id_name); + CHECK(mf_timeout_id, EINVAL); + + instr->type = INSTR_LEARNER_REARM_NEW; + instr->learn.mf_timeout_id_offset = mf_timeout_id->offset / 8; + instr->learn.mf_timeout_id_n_bits = mf_timeout_id->n_bits; + + return 0; +} + +static inline void +instr_rearm_exec(struct rte_swx_pipeline *p) +{ + struct thread *t = &p->threads[p->thread_id]; + struct instruction *ip = t->ip; + + __instr_rearm_exec(p, t, ip); + + /* Thread. */ + thread_ip_inc(p); +} + +static inline void +instr_rearm_new_exec(struct rte_swx_pipeline *p) +{ + struct thread *t = &p->threads[p->thread_id]; +
Re: [PATCH v3 1/5] vhost: prepare sync for descriptor to mbuf refactoring
Hi Xuan, On 4/19/22 05:43, xuan.d...@intel.com wrote: From: Xuan Ding This patch extracts the descriptors to buffers filling from copy_desc_to_mbuf() into a dedicated function. Besides, enqueue and dequeue path are refactored to use the same function sync_fill_seg() for preparing batch elements, which simplifies the code without performance degradation. Signed-off-by: Xuan Ding --- lib/vhost/virtio_net.c | 76 -- 1 file changed, 37 insertions(+), 39 deletions(-) Nice refactoring, thanks for donig it: Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [PATCH v3 2/5] vhost: prepare async for descriptor to mbuf refactoring
On 4/19/22 05:43, xuan.d...@intel.com wrote: From: Xuan Ding This patch refactors vhost async enqueue path and dequeue path to use the same function async_fill_seg() for preparing batch elements, which simplifies the code without performance degradation. Signed-off-by: Xuan Ding --- lib/vhost/virtio_net.c | 23 +++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c index 6d53016c75..391fb82f0e 100644 --- a/lib/vhost/virtio_net.c +++ b/lib/vhost/virtio_net.c @@ -997,13 +997,14 @@ async_iter_reset(struct vhost_async *async) } static __rte_always_inline int -async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq, +async_fill_seg(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mbuf *m, uint32_t mbuf_offset, - uint64_t buf_iova, uint32_t cpy_len) + uint64_t buf_iova, uint32_t cpy_len, bool to_desc) { struct vhost_async *async = vq->async; uint64_t mapped_len; uint32_t buf_offset = 0; + void *src, *dst; void *host_iova; while (cpy_len) { @@ -1015,10 +1016,16 @@ async_mbuf_to_desc_seg(struct virtio_net *dev, struct vhost_virtqueue *vq, return -1; } - if (unlikely(async_iter_add_iovec(dev, async, - (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, - mbuf_offset), - host_iova, (size_t)mapped_len))) + if (to_desc) { + src = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset); + dst = host_iova; + } else { + src = host_iova; + dst = (void *)(uintptr_t)rte_pktmbuf_iova_offset(m, mbuf_offset); + } + + if (unlikely(async_iter_add_iovec(dev, async, src, dst, +(size_t)mapped_len))) Minor, but it may fit in a single line. return -1; cpy_len -= (uint32_t)mapped_len; @@ -1167,8 +1174,8 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, cpy_len = RTE_MIN(buf_avail, mbuf_avail); if (is_async) { - if (async_mbuf_to_desc_seg(dev, vq, m, mbuf_offset, - buf_iova + buf_offset, cpy_len) < 0) + if (async_fill_seg(dev, vq, m, mbuf_offset, + buf_iova + buf_offset, cpy_len, true) < 0) goto error; } else { sync_fill_seg(dev, vq, m, mbuf_offset, Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling
On 4/19/22 05:43, xuan.d...@intel.com wrote: From: Xuan Ding This patches refactors copy_desc_to_mbuf() used by the sync path to support both sync and async descriptor to mbuf filling. Signed-off-by: Xuan Ding --- lib/vhost/vhost.h | 1 + lib/vhost/virtio_net.c | 48 -- 2 files changed, 38 insertions(+), 11 deletions(-) Reviewed-by: Maxime Coquelin Thanks, Maxime
Re: [PATCH v3 3/5] vhost: merge sync and async descriptor to mbuf filling
On 4/22/22 13:06, David Marchand wrote: We (at RH) have some issues with our email infrastructure, so I can't reply inline of the patch. Copy/pasting the code: +static __rte_always_inline uint16_t +async_poll_dequeue_completed_split(struct virtio_net *dev, uint16_t queue_id, + struct rte_mbuf **pkts, uint16_t count, uint16_t dma_id, + uint16_t vchan_id, bool legacy_ol_flags) +{ + uint16_t start_idx, from, i; + uint16_t nr_cpl_pkts = 0; + struct async_inflight_info *pkts_info; + struct vhost_virtqueue *vq = dev->virtqueue[queue_id]; + Please, don't pass queue_id as an input parameter for async_poll_dequeue_completed_split(). The caller of this helper already dereferenced the vq. You can pass vq. I think David's comment was intended to be a reply to patch 4, but I agree with him. Could you please fix this and also fix the build issues reported by the CI? I'll continue the review on V4. Thanks, Maxime
[RFC v2] eal: add bus cleanup to eal cleanup
During EAL init, all buses are probed and the devices found are initialized. On eal_cleanup(), the inverse does not happen, meaning any allocated memory and other configuration will not be cleaned up appropriately on exit. Currently, in order for device cleanup to take place, applications must call the driver-relevant functions to ensure proper cleanup is done before the application exits. Since initialization occurs for all devices on the bus, not just the devices used by an application, it requires a) application awareness of all bus devices that could have been probed on the system, and b) code duplication across applications to ensure cleanup is performed. An example of this is rte_eth_dev_close() which is commonly used across the example applications. This RFC proposes adding bus cleanup to the eal_cleanup() to make EAL's init/exit more symmetrical, ensuring all bus devices are cleaned up appropriately without the application needing to be aware of all bus types that may have been probed during initialization. Contained in this RFC are the changes required to perform cleanup for devices on the PCI bus during eal_cleanup(). This can be expanded in subsequent versions if these changes are desired. There would be an ask for bus maintainers to add the relevant cleanup for their buses since they have the domain expertise. Signed-off-by: Kevin Laatz Acked-by: Morten Brørup --- v2: * change log level from INFO to DEBUG for PCI cleanup * add abignore entries for rte_bus related false positives --- devtools/libabigail.abignore| 9 + drivers/bus/pci/pci_common.c| 29 + lib/eal/common/eal_common_bus.c | 18 ++ lib/eal/include/rte_bus.h | 23 +++ lib/eal/linux/eal.c | 1 + 5 files changed, 80 insertions(+) diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore index c618f20032..3ff5d4db7c 100644 --- a/devtools/libabigail.abignore +++ b/devtools/libabigail.abignore @@ -40,3 +40,12 @@ ; Ignore visibility fix of local functions in experimental gpudev library [suppress_file] soname_regexp = ^librte_gpudev\. + +; Ignore field inserted to rte_bus, adding cleanup function +[suppress_type] +name = rte_bus +has_data_member_inserted_at = end + +; Ignore changes to internally used structs containing rte_bus +[suppress_type] +name = rte_pci_bus, rte_vmbus_bus diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index 37ab879779..1bee8e8201 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -394,6 +394,34 @@ pci_probe(void) return (probed && probed == failed) ? -1 : 0; } +static int +pci_cleanup(void) +{ + struct rte_pci_device *dev = NULL; + int ret = 0; + + FOREACH_DEVICE_ON_PCIBUS(dev) { + struct rte_pci_addr *loc = &dev->addr; + struct rte_pci_driver *drv = dev->driver; + + RTE_LOG(DEBUG, EAL, + "Clean up PCI driver: %s (%x:%x) device: "PCI_PRI_FMT" (socket %i)\n", + drv->driver.name, dev->id.vendor_id, dev->id.device_id, + loc->domain, loc->bus, loc->devid, loc->function, + dev->device.numa_node); + + ret = drv->remove(dev); + if (ret < 0) { + RTE_LOG(ERR, EAL, "Cleanup for device "PCI_PRI_FMT" failed\n", + dev->addr.domain, dev->addr.bus, dev->addr.devid, + dev->addr.function); + rte_errno = errno; + } + } + + return ret; +} + /* dump one device */ static int pci_dump_one_device(FILE *f, struct rte_pci_device *dev) @@ -813,6 +841,7 @@ struct rte_pci_bus rte_pci_bus = { .bus = { .scan = rte_pci_scan, .probe = pci_probe, + .cleanup = pci_cleanup, .find_device = pci_find_device, .plug = pci_plug, .unplug = pci_unplug, diff --git a/lib/eal/common/eal_common_bus.c b/lib/eal/common/eal_common_bus.c index baa5b532af..046a06a2bf 100644 --- a/lib/eal/common/eal_common_bus.c +++ b/lib/eal/common/eal_common_bus.c @@ -85,6 +85,24 @@ rte_bus_probe(void) return 0; } +/* Clean up all devices of all buses */ +int +rte_bus_cleanup(void) +{ + int ret; + struct rte_bus *bus; + + TAILQ_FOREACH(bus, &rte_bus_list, next) { + if (bus->cleanup == NULL) + continue; + ret = bus->cleanup(); + if (ret) + RTE_LOG(ERR, EAL, "Bus (%s) cleanup failed.\n", bus->name); + } + + return 0; +} + /* Dump information of a single bus */ static int bus_dump_one(FILE *f, struct rte_bus *bus) diff --git a/lib/eal/include/rte_bus.h b/lib/eal/include/rte_bus.h ind