[dpdk-dev] [PATCH] lib: fix strcat with equivalent logic
Replace strcat with concatenation logic to avoid buffer overflow. Fixes: a6a47ac9c2 ("cfgfile: rework load function") Cc: sta...@dpdk.org Signed-off-by: Chaitanya Babu Talluri --- lib/librte_cfgfile/rte_cfgfile.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/lib/librte_cfgfile/rte_cfgfile.c b/lib/librte_cfgfile/rte_cfgfile.c index 7d8c941ea..7616742f7 100644 --- a/lib/librte_cfgfile/rte_cfgfile.c +++ b/lib/librte_cfgfile/rte_cfgfile.c @@ -227,7 +227,15 @@ rte_cfgfile_load_with_params(const char *filename, int flags, while (end != NULL) { if (*(end+1) == params->comment_character) { *end = '\0'; - strcat(split[1], end+1); + int index = strlen(split[1]); + char *temp = end+1; + while (*temp != '\0') { + split[1][index] = *temp; + index++; + temp++; + } + split[1][index] = '\0'; + } else end++; end = memchr(end, '\\', strlen(end)); -- 2.17.2
[dpdk-dev] [PATCH] lib: fix strcat with equivalent logic
Replace strcat with concatenation logic to avoid buffer overflow. Fixes: a6a47ac9c2 ("cfgfile: rework load function") Cc: sta...@dpdk.org Signed-off-by: Chaitanya Babu Talluri --- lib/librte_cfgfile/rte_cfgfile.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/lib/librte_cfgfile/rte_cfgfile.c b/lib/librte_cfgfile/rte_cfgfile.c index 7d8c941ea..7616742f7 100644 --- a/lib/librte_cfgfile/rte_cfgfile.c +++ b/lib/librte_cfgfile/rte_cfgfile.c @@ -227,7 +227,15 @@ rte_cfgfile_load_with_params(const char *filename, int flags, while (end != NULL) { if (*(end+1) == params->comment_character) { *end = '\0'; - strcat(split[1], end+1); + int index = strlen(split[1]); + char *temp = end+1; + while (*temp != '\0') { + split[1][index] = *temp; + index++; + temp++; + } + split[1][index] = '\0'; + } else end++; end = memchr(end, '\\', strlen(end)); -- 2.17.2
Re: [dpdk-dev] [PATCH] eal: restrict ctrl threads to startup cpu affinity
On 13-Feb-19 4:13 PM, David Marchand wrote: Spawning the ctrl threads on anything that is not part of the eal coremask is not that polite to the rest of the system. Rather than introduce yet another eal options for this, let's take the startup cpu affinity as a reference and remove the eal coremask from it. If no cpu is left, then we default to the master core. The cpuset is computed once at init before the original cpu affinity. Fixes: d651ee4919cd ("eal: set affinity for control threads") Signed-off-by: David Marchand --- Hi David, Maybe i didn't have enough coffee today and i'm missing something here, but how is this different? Removing the coremask cores from the cpuset will effectively "spawn the ctrl threads on anything that is not part of the EAL coremask" (which is "not that polite to the rest of the system"), unless the application was run with taskset. Is "taskset" the key point here? I.e. by default, we're still "not polite", unless the user asks nicely? :) -- Thanks, Anatoly
[dpdk-dev] [PATCH v2] test/distributor: fix sprintf with strlcpy
sprintf function is not secure as it doesn't check the length of string. replaced sprintf with strlcpy. Fixes: f74df2c57e ("test/distributor: test single and burst API") Cc: sta...@dpdk.org Signed-off-by: Pallantla Poornima --- v2: Addressed review comment to replace snprintf to strlcpy. --- test/test/test_distributor.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c index 98919ec0c..da3348fd1 100644 --- a/test/test/test_distributor.c +++ b/test/test/test_distributor.c @@ -11,6 +11,7 @@ #include #include #include +#include #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */ #define BURST 32 @@ -642,9 +643,11 @@ test_distributor(void) worker_params.dist = dist[i]; if (i) - sprintf(worker_params.name, "burst"); + strlcpy(worker_params.name, "burst", + sizeof(worker_params.name)); else - sprintf(worker_params.name, "single"); + strlcpy(worker_params.name, "single", + sizeof(worker_params.name)); rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER); -- 2.17.2
Re: [dpdk-dev] [PATCH] eal: restrict ctrl threads to startup cpu affinity
On Thu, Feb 14, 2019 at 10:39 AM Burakov, Anatoly wrote: > On 13-Feb-19 4:13 PM, David Marchand wrote: > > Spawning the ctrl threads on anything that is not part of the eal > > coremask is not that polite to the rest of the system. > > > > Rather than introduce yet another eal options for this, let's take > > the startup cpu affinity as a reference and remove the eal coremask > > from it. > > If no cpu is left, then we default to the master core. > > > > The cpuset is computed once at init before the original cpu affinity. > > > > Fixes: d651ee4919cd ("eal: set affinity for control threads") > > Signed-off-by: David Marchand > > --- > > Hi David, > > Maybe i didn't have enough coffee today and i'm missing something here, > but how is this different? Removing the coremask cores from the cpuset > will effectively "spawn the ctrl threads on anything that is not part of > the EAL coremask" (which is "not that polite to the rest of the > system"), unless the application was run with taskset. > > Is "taskset" the key point here? I.e. by default, we're still "not > polite", unless the user asks nicely? :) > Eheh, sorry, yes. A bit more context then, if you want to clearly pin cpu resources for the processes on your system (let's say having virtual machines and a popular vswitch), I can only think of two solutions. Either you have something to configure your processes to have them call sched_setaffinity/pthread_set_affinity_np, or you use taskset to get them "jailed" without them caring. Before the incriminated commit, we were keeping all threads on the coremask that had been passed, but as Olivier said, we would end up with ctrl threads spanwed on core running dataplane threads as well. Now, the ctrl threads can be spawned anywhere on all & ~coremask, with no way to configure this. I considered adding a new eal option, but I think relying on the current cpu affinity is a better default behavior and I can't see drawbacks at the moment. -- David Marchand
[dpdk-dev] [PATCH v2] service: fix parameter type
The type of value parameter to rte_service_attr_get should be uint64_t *, since the attributes are of type uint64_t. Fixes: 4d55194d76a4 ("service: add attribute get function") Reviewed-by: Gage Eads Signed-off-by: Nikhil Rao Acked-by: Harry van Haaren --- lib/librte_eal/common/include/rte_service.h | 2 +- lib/librte_eal/common/rte_service.c | 2 +- test/test/test_service_cores.c | 2 +- doc/guides/rel_notes/release_19_05.rst | 5 - lib/librte_eal/bsdapp/eal/Makefile | 2 +- lib/librte_eal/linuxapp/eal/Makefile| 2 +- lib/librte_eal/meson.build | 2 +- 7 files changed, 10 insertions(+), 7 deletions(-) diff --git a/lib/librte_eal/common/include/rte_service.h b/lib/librte_eal/common/include/rte_service.h index 34b41af..670c89a 100644 --- a/lib/librte_eal/common/include/rte_service.h +++ b/lib/librte_eal/common/include/rte_service.h @@ -372,7 +372,7 @@ int32_t rte_service_run_iter_on_app_lcore(uint32_t id, * -EINVAL Invalid id, attr_id or attr_value was NULL. */ int32_t rte_service_attr_get(uint32_t id, uint32_t attr_id, - uint32_t *attr_value); + uint64_t *attr_value); /** * Reset all attribute values of a service. diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c index 03fde97..5f75e5a 100644 --- a/lib/librte_eal/common/rte_service.c +++ b/lib/librte_eal/common/rte_service.c @@ -734,7 +734,7 @@ int32_t rte_service_run_iter_on_app_lcore(uint32_t id, } int32_t -rte_service_attr_get(uint32_t id, uint32_t attr_id, uint32_t *attr_value) +rte_service_attr_get(uint32_t id, uint32_t attr_id, uint64_t *attr_value) { struct rte_service_spec_impl *s; SERVICE_VALID_GET_OR_ERR_RET(id, s, -EINVAL); diff --git a/test/test/test_service_cores.c b/test/test/test_service_cores.c index ec31882..82bb2ce 100644 --- a/test/test/test_service_cores.c +++ b/test/test/test_service_cores.c @@ -259,7 +259,7 @@ static int32_t dummy_mt_safe_cb(void *args) rte_service_set_stats_enable(id, 1); uint32_t attr_id = UINT32_MAX; - uint32_t attr_value = 0xdead; + uint64_t attr_value = 0xdead; /* check error return values */ TEST_ASSERT_EQUAL(-EINVAL, rte_service_attr_get(id, attr_id, &attr_value), diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst index 2b0f60d..b8ed3d3 100644 --- a/doc/guides/rel_notes/release_19_05.rst +++ b/doc/guides/rel_notes/release_19_05.rst @@ -94,6 +94,9 @@ API Changes Also, make sure to start the actual text at the margin. = +eal: as shown in the 19.02 deprecation notice, the type of the ``attr_value`` + parameter of the function ``rte_service_attr_get()`` has been changed + from ``uint32_t *`` to ``uint64_t *``. ABI Changes --- @@ -143,7 +146,7 @@ The libraries prepended with a plus sign were incremented in this version. librte_compressdev.so.1 librte_cryptodev.so.6 librte_distributor.so.1 - librte_eal.so.9 ++librte_eal.so.10 librte_efd.so.1 librte_ethdev.so.11 librte_eventdev.so.6 diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile index bfeddaa..4bc8555 100644 --- a/lib/librte_eal/bsdapp/eal/Makefile +++ b/lib/librte_eal/bsdapp/eal/Makefile @@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs EXPORT_MAP := ../../rte_eal_version.map -LIBABIVER := 9 +LIBABIVER := 10 # specific to bsdapp exec-env SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile index 51deb57..db6aca3 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH) EXPORT_MAP := ../../rte_eal_version.map VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR) -LIBABIVER := 9 +LIBABIVER := 10 VPATH += $(RTE_SDK)/lib/librte_eal/common diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build index a18f3a8..030171e 100644 --- a/lib/librte_eal/meson.build +++ b/lib/librte_eal/meson.build @@ -21,7 +21,7 @@ else error('unsupported system type "@0@"'.format(host_machine.system())) endif -version = 9 # the version of the EAL API +version = 10 # the version of the EAL API allow_experimental_apis = true deps += 'compat' deps += 'kvargs' -- 1.8.3.1
Re: [dpdk-dev] [PATCH] eal: restrict ctrl threads to startup cpu affinity
On 14-Feb-19 9:53 AM, David Marchand wrote: On Thu, Feb 14, 2019 at 10:39 AM Burakov, Anatoly mailto:anatoly.bura...@intel.com>> wrote: On 13-Feb-19 4:13 PM, David Marchand wrote: > Spawning the ctrl threads on anything that is not part of the eal > coremask is not that polite to the rest of the system. > > Rather than introduce yet another eal options for this, let's take > the startup cpu affinity as a reference and remove the eal coremask > from it. > If no cpu is left, then we default to the master core. > > The cpuset is computed once at init before the original cpu affinity. > > Fixes: d651ee4919cd ("eal: set affinity for control threads") > Signed-off-by: David Marchand mailto:david.march...@redhat.com>> > --- Hi David, Maybe i didn't have enough coffee today and i'm missing something here, but how is this different? Removing the coremask cores from the cpuset will effectively "spawn the ctrl threads on anything that is not part of the EAL coremask" (which is "not that polite to the rest of the system"), unless the application was run with taskset. Is "taskset" the key point here? I.e. by default, we're still "not polite", unless the user asks nicely? :) Eheh, sorry, yes. A bit more context then, if you want to clearly pin cpu resources for the processes on your system (let's say having virtual machines and a popular vswitch), I can only think of two solutions. Either you have something to configure your processes to have them call sched_setaffinity/pthread_set_affinity_np, or you use taskset to get them "jailed" without them caring. Before the incriminated commit, we were keeping all threads on the coremask that had been passed, but as Olivier said, we would end up with ctrl threads spanwed on core running dataplane threads as well. Now, the ctrl threads can be spawned anywhere on all & ~coremask, with no way to configure this. I considered adding a new eal option, but I think relying on the current cpu affinity is a better default behavior and I can't see drawbacks at the moment. -- David Marchand OK, that makes sense. However, i feel this behavior (both old and new, for that matter) should be better documented somewhere in the EAL docs. -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH] eal: restrict ctrl threads to startup cpu affinity
On Thu, Feb 14, 2019 at 11:05 AM Burakov, Anatoly wrote: > On 14-Feb-19 9:53 AM, David Marchand wrote: > > > A bit more context then, if you want to clearly pin cpu resources for > > the processes on your system (let's say having virtual machines and a > > popular vswitch), I can only think of two solutions. > > Either you have something to configure your processes to have them call > > sched_setaffinity/pthread_set_affinity_np, or you use taskset to get > > them "jailed" without them caring. > > > > Before the incriminated commit, we were keeping all threads on the > > coremask that had been passed, but as Olivier said, we would end up with > > ctrl threads spanwed on core running dataplane threads as well. > > > > Now, the ctrl threads can be spawned anywhere on all & ~coremask, with > > no way to configure this. > > I considered adding a new eal option, but I think relying on the current > > cpu affinity is a better default behavior and I can't see drawbacks at > > the moment. > > OK, that makes sense. However, i feel this behavior (both old and new, > for that matter) should be better documented somewhere in the EAL docs. > > I'll see what I can add in the doc for v2. -- David Marchand
Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
On 13-Feb-19 7:24 PM, Shahaf Shuler wrote: Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler wrote: This series is in continue to RFC[1]. The DPDK APIs expose 3 different modes to work with memory used for DMA: 1. Use the DPDK owned memory (backed by the DPDK provided hugepages). This memory is allocated by the DPDK libraries, included in the DPDK memory system (memseg lists) and automatically DMA mapped by the DPDK layers. 2. Use memory allocated by the user and register to the DPDK memory systems. This is also referred as external memory. Upon registration of the external memory, the DPDK layers will DMA map it to all needed devices. 3. Use memory allocated by the user and not registered to the DPDK memory system. This is for users who wants to have tight control on this memory. The user will need to explicitly call DMA map function in order to register such memory to the different devices. The scope of the patch focus on #3 above. Why can not we have case 2 covering case 3? Because it is not our choice rather the DPDK application. We could not allow it, and force the application to register their external memory to the DPDK memory management system. However IMO it will be wrong. The use case exists - some application wants to manage their memory by themselves. w/o the extra overhead of rte_malloc, without creating a special socket to populate the memory and without redundant API calls to rte_extmem_*. Simply allocate chunk of memory, DMA map it to device and that’s it. Just a small note: while this sounds good on paper, i should point out that at least *registering* the memory with DPDK is a necessity. You may see rte_extmem_* calls as redundant (and i agree, to an extent), but we don't advertise our PMD's capabilities in a way that makes it easy to determine whether a particular PMD will or will not work without registering external memory within DPDK (i.e. does it use rte_virt2memseg() internally, for example). So, extmem register calls are a necessary evil in such case, and IMO should be called out as required for such external memory usage scenario. -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap
On Wed, Feb 13, 2019 at 07:11:35PM +, Shahaf Shuler wrote: > Wednesday, February 13, 2019 1:44 PM, Gaëtan Rivet: > > Subject: Re: [PATCH 5/6] net/mlx5: support PCI device DMA map and unmap > > > > On Wed, Feb 13, 2019 at 12:35:04PM +0100, Gaëtan Rivet wrote: > > > On Wed, Feb 13, 2019 at 11:10:25AM +0200, Shahaf Shuler wrote: > > > > The implementation reuses the external memory registration work done > > [..] > > > > > + > > > > + /** > > > > +* We really need to iterate all eth devices regardless of > > > > +* their owner. > > > > +*/ > > > > + while (port_id < RTE_MAX_ETHPORTS) { > > > > + port_id = rte_eth_find_next(port_id); > > > > + if (port_id >= RTE_MAX_ETHPORTS) > > > > + break; > > > > + dev = &rte_eth_devices[port_id]; > > > > + drv_name = dev->device->driver->name; > > > > + if (!strncmp(drv_name, MLX5_DRIVER_NAME, > > > > +sizeof(MLX5_DRIVER_NAME) + 1) && > > > > + pdev == RTE_DEV_TO_PCI(dev->device)) { > > > > + /* found the PCI device. */ > > > > + return dev; > > > > + } > > > > + } > > > > + return NULL; > > > > +} > > > > > > Might I interest you in the new API? > > Good suggestion, will have a look on it in depth. > > > > > > >{ > > >struct rte_dev_iterator it; > > >struct rte_device *dev; > > > > > >RTE_DEV_FOREACH(dev, "class=eth", &it) > > >if (dev == &pdev->device) > > >return it.class_device; > > >return NULL; > > >} > > > > > > > On that note, this could be in the PCI bus instead? > > We can put it on the PCI bus, but it would mean the PCI bus will not be > device agnostic. > Currently, I couldn't find any reference to eth_dev on the PCI bus, besides a > single macro which convert to pci device that doesn't really do type checks. > > Having it in, would mean the PCI will need start to distinguish between > ethdev, crypto dev and what ever devices exists on its bus. > I think it's worth thinking about it. It can stay class-agnostic: void * rte_pci_device_class(struct rte_pci_device *pdev, const char *class) { char devstr[15+strlen(class)]; struct rte_dev_iterator it; struct rte_device *dev; snprintf(devstr, sizeof(devstr), "bus=pci/class=%s", class); RTE_DEV_FOREACH(dev, devstr, &it) if (dev == &pdev->device) return it.class_device; return NULL; } (not a fan of the stack VLA but whatever.) then: eth_dev = rte_pci_device_class(pdev, "eth"); Whichever type of device could be returned. Only limit is that you have to know beforehand what is the device type of the PCI device you are querying about, but that's necessary anyway. And if it was instead a crypto dev, it would return NULL. -- Gaëtan Rivet 6WIND
Re: [dpdk-dev] [PATCH] eal: restrict ctrl threads to startup cpu affinity
On Wed, Feb 13, 2019 at 5:14 PM David Marchand wrote: > Spawning the ctrl threads on anything that is not part of the eal > coremask is not that polite to the rest of the system. > > Rather than introduce yet another eal options for this, let's take > the startup cpu affinity as a reference and remove the eal coremask > from it. > If no cpu is left, then we default to the master core. > > The cpuset is computed once at init before the original cpu affinity. > > Fixes: d651ee4919cd ("eal: set affinity for control threads") > Signed-off-by: David Marchand > --- > lib/librte_eal/common/eal_common_options.c | 28 > > lib/librte_eal/common/eal_common_thread.c | 21 - > lib/librte_eal/common/eal_internal_cfg.h | 3 +++ > 3 files changed, 35 insertions(+), 17 deletions(-) > > diff --git a/lib/librte_eal/common/eal_common_options.c > b/lib/librte_eal/common/eal_common_options.c > index 6c96f45..b766252 100644 > --- a/lib/librte_eal/common/eal_common_options.c > +++ b/lib/librte_eal/common/eal_common_options.c > @@ -1360,6 +1361,31 @@ static int xdigit2val(unsigned char c) > cfg->lcore_count -= removed; > } > > +static void > +compute_ctrl_threads_cpuset(struct internal_config *internal_cfg) > +{ > + rte_cpuset_t *cpuset = &internal_cfg->ctrl_cpuset; > + rte_cpuset_t default_set; > + unsigned int lcore_id; > + > + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { > + if (eal_cpu_detected(lcore_id) && > + rte_lcore_has_role(lcore_id, ROLE_OFF)) { > + CPU_SET(lcore_id, cpuset); > + } > + } > + > + if (pthread_getaffinity_np(pthread_self(), sizeof(rte_cpuset_t), > + &default_set) < 0) > + CPU_ZERO(&default_set); > Now that I am testing on Freebsd, I can see this block I took from the existing "auto detect" function is just wrong. pthread_(g|s)etaffinity_np return a > 0 error value when failing. -- David Marchand
[dpdk-dev] rte_flow update support?
Hi all, Are there plans to add support for modifying rules using the rte_flow API ? The first problem with destroy+create is atomicity. During the process some packets will get lost. Then the second problem is performance. We measured Mellanox CX5 (mlx5 driver) to be able to "update" at best 2K rules/sec, but that drops to 200 rules/sec when updating TC rules ("transfer" rules, to switch packets between VFs). Real support for update should boost those numbers. I saw the ibverbs API backing the mlx5 supports updating the action of a rule. This would already solve a lot of use cases. Eg, changing destination queue(s) of some rules. Given that mlx5 does not support changing global RSS queues without restarting the device, this would also solve re-balancing issue by using rte_flow. Then, beyond updating only the action of a rule, some researchers have shown[1] that updating rules patterns data instead of creating and deleting rules with similar patterns improve drastically the performance. Eg that could be very interesting to accelerate the offloading of OVS's flow cache (5-tuples), or similar setups. Thanks, Tom [1] Turboflow: information rich flow record generation on commodity switches, J Sonchack et al.
Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
On Wed, Feb 13, 2019 at 7:24 PM Shahaf Shuler wrote: > Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: > > Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for > > external memory > > > > On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler > > wrote: > > > > > This series is in continue to RFC[1]. > > > > > > The DPDK APIs expose 3 different modes to work with memory used for > > DMA: > > > > > > 1. Use the DPDK owned memory (backed by the DPDK provided > > hugepages). > > > This memory is allocated by the DPDK libraries, included in the DPDK > > > memory system (memseg lists) and automatically DMA mapped by the > > DPDK > > > layers. > > > > > > 2. Use memory allocated by the user and register to the DPDK memory > > > systems. This is also referred as external memory. Upon registration > > > of the external memory, the DPDK layers will DMA map it to all needed > > > devices. > > > > > > 3. Use memory allocated by the user and not registered to the DPDK > > > memory system. This is for users who wants to have tight control on > > > this memory. The user will need to explicitly call DMA map function in > > > order to register such memory to the different devices. > > > > > > The scope of the patch focus on #3 above. > > > > > > > > Why can not we have case 2 covering case 3? > > Because it is not our choice rather the DPDK application. > We could not allow it, and force the application to register their > external memory to the DPDK memory management system. However IMO it will > be wrong. > The use case exists - some application wants to manage their memory by > themselves. w/o the extra overhead of rte_malloc, without creating a > special socket to populate the memory and without redundant API calls to > rte_extmem_*. > > Simply allocate chunk of memory, DMA map it to device and that’s it. > > Usability is a strong point, but up to some extent. DPDK is all about performance, and adding options the user can choose from will add pressure and complexity for keeping the performance. Your proposal makes sense from an user point of view, but will it avoid to modify things in the DPDK core for supporting this case broadly in the future? Multiprocess will be hard to get, if not impossible, without adding more complexity, and although you likely do not expect that use case requiring multiprocess support, once we have DPDK apps using this model, sooner or later those companies with products based on such option will demand broadly support. I can foresee not just multiprocess support will require changes in the future. This reminds me the case of obtaining real time: the more complexity the less determinism can be obtained. It is not impossible, simply it is far more complex. Pure real time operating systems can add new functionalities, but it is hard to do it properly without jeopardising the main goal. Generic purpose operating systems can try to improve determinism, but up to some extent and with important complexity costs. DPDK is the real time operating system in this comparison. > > > > > > > Currently the only way to map external memory is through VFIO > > > (rte_vfio_dma_map). While VFIO is common, there are other vendors > > > which use different ways to map memory (e.g. Mellanox and NXP). > > > > > > > > As you say, VFIO is common, and when allowing DMAs programmed in user > > space, the right thing to do. > > It is common indeed. Why it the right thing to do? > > Compared with UIO, for sure. VFIO does have the right view of the system in terms of which devices can properly be isolated. Can you confirm a specific implementation by a vendor can ensure same behaviour? If so, do you have duplicated code then? if the answer is your are using VFIO data, why not to use VFIO as the interface and add the required connection between VFIO and drivers? What about mapping validation? is the driver doing that part or relying on kernel code? or is it just assuming the mapping is safe? > I'm assuming there is an IOMMU hardware and > > this is what Mellanox and NXP rely on in some way or another. > > For Mellanox, the device works with virtual memory, not physical. If you > think of it, it is more secure for user space application. Mellanox device > has internal memory translation unit between virtual memory and physical > memory. > IOMMU can be added on top of it, in case the host doesn't trust the device > or the device is given to untrusted entity like VM. > > Any current NIC or device will work with virtual addresses if IOMMU is in place, not matter if the device is IOMMU-aware or not. Any vendor, with that capability in their devices, should follow generic paths and a common interface with the vendor drivers being the executors. The drivers know how to tell the device, but they should be told what to tell and not by the user but by the kernel. I think reading your comment "in case the host doesn't trust the device" makes easier to understand what you try to obtain, and at the same time
Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
On Thu, Feb 14, 2019 at 12:22 PM Alejandro Lucero < alejandro.luc...@netronome.com> wrote: > > > On Wed, Feb 13, 2019 at 7:24 PM Shahaf Shuler > wrote: > >> Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: >> > Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for >> > external memory >> > >> > On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler >> > wrote: >> > >> > > This series is in continue to RFC[1]. >> > > >> > > The DPDK APIs expose 3 different modes to work with memory used for >> > DMA: >> > > >> > > 1. Use the DPDK owned memory (backed by the DPDK provided >> > hugepages). >> > > This memory is allocated by the DPDK libraries, included in the DPDK >> > > memory system (memseg lists) and automatically DMA mapped by the >> > DPDK >> > > layers. >> > > >> > > 2. Use memory allocated by the user and register to the DPDK memory >> > > systems. This is also referred as external memory. Upon registration >> > > of the external memory, the DPDK layers will DMA map it to all needed >> > > devices. >> > > >> > > 3. Use memory allocated by the user and not registered to the DPDK >> > > memory system. This is for users who wants to have tight control on >> > > this memory. The user will need to explicitly call DMA map function in >> > > order to register such memory to the different devices. >> > > >> > > The scope of the patch focus on #3 above. >> > > >> > > >> > Why can not we have case 2 covering case 3? >> >> Because it is not our choice rather the DPDK application. >> We could not allow it, and force the application to register their >> external memory to the DPDK memory management system. However IMO it will >> be wrong. >> The use case exists - some application wants to manage their memory by >> themselves. w/o the extra overhead of rte_malloc, without creating a >> special socket to populate the memory and without redundant API calls to >> rte_extmem_*. >> >> Simply allocate chunk of memory, DMA map it to device and that’s it. >> >> > Usability is a strong point, but up to some extent. DPDK is all about > performance, and adding options the user can choose from will add pressure > and complexity for keeping the performance. Your proposal makes sense from > an user point of view, but will it avoid to modify things in the DPDK core > for supporting this case broadly in the future? Multiprocess will be hard > to get, if not impossible, without adding more complexity, and although you > likely do not expect that use case requiring multiprocess support, once we > have DPDK apps using this model, sooner or later those companies with > products based on such option will demand broadly support. I can foresee > not just multiprocess support will require changes in the future. > > This reminds me the case of obtaining real time: the more complexity the > less determinism can be obtained. It is not impossible, simply it is far > more complex. Pure real time operating systems can add new functionalities, > but it is hard to do it properly without jeopardising the main goal. > Generic purpose operating systems can try to improve determinism, but up to > some extent and with important complexity costs. DPDK is the real time > operating system in this comparison. > > >> > >> > >> > > Currently the only way to map external memory is through VFIO >> > > (rte_vfio_dma_map). While VFIO is common, there are other vendors >> > > which use different ways to map memory (e.g. Mellanox and NXP). >> > > >> > > >> > As you say, VFIO is common, and when allowing DMAs programmed in user >> > space, the right thing to do. >> >> It is common indeed. Why it the right thing to do? >> >> > Compared with UIO, for sure. VFIO does have the right view of the system > in terms of which devices can properly be isolated. Can you confirm a > specific implementation by a vendor can ensure same behaviour? If so, do > you have duplicated code then? if the answer is your are using VFIO data, > why not to use VFIO as the interface and add the required connection > between VFIO and drivers? > > What about mapping validation? is the driver doing that part or relying on > kernel code? or is it just assuming the mapping is safe? > > >> I'm assuming there is an IOMMU hardware and >> > this is what Mellanox and NXP rely on in some way or another. >> >> For Mellanox, the device works with virtual memory, not physical. If you >> think of it, it is more secure for user space application. Mellanox device >> has internal memory translation unit between virtual memory and physical >> memory. >> IOMMU can be added on top of it, in case the host doesn't trust the >> device or the device is given to untrusted entity like VM. >> >> > Any current NIC or device will work with virtual addresses if IOMMU is in > place, not matter if the device is IOMMU-aware or not. Any vendor, with > that capability in their devices, should follow generic paths and a common > interface with the vendor drivers being the executors. The drivers know how > to te
[dpdk-dev] 19.05 Intel Roadmap
These are the features that we plan to submit for the 19.05 release: Intel® QuickAssist Technology (Intel® QAT) Asymmetric Crypto: The initial implementation will provide support for modexp and modinv. Additional algorithms will be supported in future releases. Intel® QAT AES-XTS Support Intel® QAT Compression PMD Support for Large Scatter-Gather Lists ICE & AVF PMD Enhancements: Enhancements to the ICE and AVF PMDs to support Intel® Ethernet 800 Series Network Adapters. AF_XDP PMD: Add a DPDK PMD to support AF_XDP. This will allow better inter-working with the Linux kernel. Virtio Optimisations: This includes IN_ORDER performance optimisation in Virtio1.1 packed ring driver, and EVENT_IDX optimisation for better interrupt suppression in Vhost. IPsec Library Enhancements: Add support for additional crypto/auth algorithms, header construction (RFC4301, 5.1.2), and performance optimisations. DPDK for Windows: Initial support for DPDK on Windows, including changes to support the clang-win64 compiler and updates to the helloworld sample app. Non-Blocking Stack Mempool Handler: Add a non-blocking stack handler, which means: "failure or suspension of any thread cannot cause failure or suspension of another thread". I40E VXLAN-GPE Support I40E Support for MARK+RSS Rte_flow Actions Baseband Device Turbo PMD: Add a new PMD to support hardware acceleration for turbo encoding/decoding.
[dpdk-dev] [PATCH v2] hash: fix sprintf with snprintf
sprintf function is not secure as it doesn't check the length of string. More secure function snprintf is used. Fixes: 473d1bebce ("hash: allow to store data in hash table") Cc: sta...@dpdk.org Signed-off-by: Pallantla Poornima --- v2: Addressed review comment to correct the format specifier of hastest_key_lens. --- test/test/test_hash_perf.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c index 525211180..5648fce02 100644 --- a/test/test/test_hash_perf.c +++ b/test/test/test_hash_perf.c @@ -85,9 +85,11 @@ create_table(unsigned int with_data, unsigned int table_index, if (with_data) /* Table will store 8-byte data */ - sprintf(name, "test_hash%d_data", hashtest_key_lens[table_index]); + snprintf(name, sizeof(name), "test_hash%u_data", + hashtest_key_lens[table_index]); else - sprintf(name, "test_hash%d", hashtest_key_lens[table_index]); + snprintf(name, sizeof(name), "test_hash%u", + hashtest_key_lens[table_index]); if (with_locks) -- 2.17.2
Re: [dpdk-dev] rte_flow update support?
Hi Tom, Thursday, February 14, 2019 1:31 PM, Tom Barbette: > Subject: rte_flow update support? > > Hi all, > > Are there plans to add support for modifying rules using the rte_flow API ? > > The first problem with destroy+create is atomicity. During the process some > packets will get lost. > > Then the second problem is performance. We measured Mellanox CX5 (mlx5 > driver) to be able to "update" at best 2K rules/sec, but that drops to > 200 rules/sec when updating TC rules ("transfer" rules, to switch packets > between VFs). Real support for update should boost those numbers. Yes you are right, the current update rate of verbs and TC is not so good. > > I saw the ibverbs API backing the mlx5 supports updating the action of a rule. > This would already solve a lot of use cases. Eg, changing destination queue(s) > of some rules. Given that mlx5 does not support changing global RSS queues > without restarting the device, this would also solve re-balancing issue by > using rte_flow. Updating the action list will solve only part of issue, what you really want (for OVS case) is to update the flow pattern as well (since TCP connection got terminated and new one was created). > > Then, beyond updating only the action of a rule, some researchers have > shown[1] that updating rules patterns data instead of creating and deleting > rules with similar patterns improve drastically the performance. Eg that could > be very interesting to accelerate the offloading of OVS's flow cache (5- > tuples), or similar setups. Stay tuned, we are working on it. There is a new engine for flow rules which will be very fast. Expected performance will be ~300K updates per second and will include both transfer and regular flow rules. It is in plans for 19.XX releases. Would recommend you to have a try on it once ready. > > Thanks, > Tom > > > [1] Turboflow: information rich flow record generation on commodity > switches, J Sonchack et al.
[dpdk-dev] [PATCH] eal: fix check when retrieving current cpu affinity
pthread_getaffinity_np returns a >0 value when failing. This is mainly for the sake of correctness. The only case where it could fail is when passing an incorrect cpuset size wrt to the kernel. Fixes: 2eba8d21f3c9 ("eal: restrict cores auto detection") Signed-off-by: David Marchand --- lib/librte_eal/common/eal_common_options.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index 6c96f45..1f45f82 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -1343,10 +1343,9 @@ static int xdigit2val(unsigned char c) unsigned int lcore_id; unsigned int removed = 0; rte_cpuset_t affinity_set; - pthread_t tid = pthread_self(); - if (pthread_getaffinity_np(tid, sizeof(rte_cpuset_t), - &affinity_set) < 0) + if (pthread_getaffinity_np(pthread_self(), sizeof(rte_cpuset_t), + &affinity_set)) CPU_ZERO(&affinity_set); for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { -- 1.8.3.1
Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
Thursday, February 14, 2019 12:19 PM, Burakov, Anatoly: > Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for > external memory > > On 13-Feb-19 7:24 PM, Shahaf Shuler wrote: > > Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: > >> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for > >> external memory > >> > >> On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler > > >> wrote: > >> > >>> This series is in continue to RFC[1]. > >>> > >>> The DPDK APIs expose 3 different modes to work with memory used for > >> DMA: > >>> > >>> 1. Use the DPDK owned memory (backed by the DPDK provided > >> hugepages). > >>> This memory is allocated by the DPDK libraries, included in the DPDK > >>> memory system (memseg lists) and automatically DMA mapped by the > >> DPDK > >>> layers. > >>> > >>> 2. Use memory allocated by the user and register to the DPDK memory > >>> systems. This is also referred as external memory. Upon registration > >>> of the external memory, the DPDK layers will DMA map it to all > >>> needed devices. > >>> > >>> 3. Use memory allocated by the user and not registered to the DPDK > >>> memory system. This is for users who wants to have tight control on > >>> this memory. The user will need to explicitly call DMA map function > >>> in order to register such memory to the different devices. > >>> > >>> The scope of the patch focus on #3 above. > >>> > >>> > >> Why can not we have case 2 covering case 3? > > > > Because it is not our choice rather the DPDK application. > > We could not allow it, and force the application to register their external > memory to the DPDK memory management system. However IMO it will be > wrong. > > The use case exists - some application wants to manage their memory by > themselves. w/o the extra overhead of rte_malloc, without creating a special > socket to populate the memory and without redundant API calls to > rte_extmem_*. > > > > Simply allocate chunk of memory, DMA map it to device and that’s it. > > Just a small note: while this sounds good on paper, i should point out that at > least *registering* the memory with DPDK is a necessity. You may see > rte_extmem_* calls as redundant (and i agree, to an extent), but we don't > advertise our PMD's capabilities in a way that makes it easy to determine > whether a particular PMD will or will not work without registering external > memory within DPDK (i.e. does it use > rte_virt2memseg() internally, for example). > > So, extmem register calls are a necessary evil in such case, and IMO should > be called out as required for such external memory usage scenario. If we are going to force all to use the extmem, then there is no need w/ this API. we can have the PMDs to register when the memory is registered. We can just drop the vfio_dma_map APIs and that's it. > > -- > Thanks, > Anatoly
[dpdk-dev] [PATCH v2 1/2] eal: fix potential incorrect pinning for ctrl threads
pthread_setaffinity_np returns a >0 value on error. We could end up letting the ctrl threads on the current process cpu affinity. Fixes: d651ee4919cd ("eal: set affinity for control threads") Signed-off-by: David Marchand --- lib/librte_eal/common/eal_common_thread.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c index 48ef4d6..a3985ce 100644 --- a/lib/librte_eal/common/eal_common_thread.c +++ b/lib/librte_eal/common/eal_common_thread.c @@ -209,7 +209,7 @@ static void *rte_thread_init(void *arg) CPU_SET(rte_get_master_lcore(), &cpuset); ret = pthread_setaffinity_np(*thread, sizeof(cpuset), &cpuset); - if (ret < 0) + if (ret) goto fail; ret = pthread_barrier_wait(¶ms->configured); -- 1.8.3.1
[dpdk-dev] [PATCH v2 2/2] eal: restrict ctrl threads to startup cpu affinity
Spawning the ctrl threads on anything that is not part of the eal coremask is not that polite to the rest of the system, especially when you took good care to pin your processes on cpu resources with tools like taskset (linux) / cpuset (freebsd). Rather than introduce yet another eal options to control on which cpu those ctrl threads are created, let's take the startup cpu affinity as a reference and remove the eal coremask from it. If no cpu is left, then we default to the master core. The cpuset is computed once at init before the original cpu affinity is lost. Introduced a RTE_CPU_AND macro to abstract the differences between linux and freebsd respective macros. Examples in a 4 cores FreeBSD vm: $ ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \ -- -i --total-num-mbufs=2048 $ procstat -S 1057 PIDTID COMMTDNAME CPU CSID CPU MASK 1057 100131 testpmd - 21 2 1057 100140 testpmd eal-intr-thread 11 0-1 1057 100141 testpmd rte_mp_handle 11 0-1 1057 100142 testpmd lcore-slave-3 31 3 $ cpuset -l 1,2,3 ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \ -- -i --total-num-mbufs=2048 $ procstat -S 1061 PIDTID COMMTDNAME CPU CSID CPU MASK 1061 100131 testpmd - 22 2 1061 100144 testpmd eal-intr-thread 12 1 1061 100145 testpmd rte_mp_handle 12 1 1061 100147 testpmd lcore-slave-3 32 3 $ cpuset -l 2,3 ./build/app/testpmd -l 2,3 --no-huge --no-pci -m 512 \ -- -i --total-num-mbufs=2048 $ procstat -S 1065 PIDTID COMMTDNAME CPU CSID CPU MASK 1065 100131 testpmd - 22 2 1065 100148 testpmd eal-intr-thread 22 2 1065 100149 testpmd rte_mp_handle 22 2 1065 100150 testpmd lcore-slave-3 32 3 Fixes: d651ee4919cd ("eal: set affinity for control threads") Signed-off-by: David Marchand --- Changes since v1: - added some description in the prog guide - fixed FreeBSD build --- doc/guides/prog_guide/env_abstraction_layer.rst | 14 + lib/librte_eal/common/eal_common_options.c | 28 + lib/librte_eal/common/eal_common_thread.c | 21 --- lib/librte_eal/common/eal_internal_cfg.h| 3 +++ lib/librte_eal/common/include/rte_lcore.h | 17 +++ 5 files changed, 62 insertions(+), 21 deletions(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 929d76d..bfc66d9 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -498,6 +498,20 @@ Those TLS include *_cpuset* and *_socket_id*: * *_socket_id* stores the NUMA node of the CPU set. If the CPUs in CPU set belong to different NUMA node, the *_socket_id* will be set to SOCKET_ID_ANY. +Control Thread API +~~ + +It is possible to create Control Threads using the public API ``rte_ctrl_thread_create()``. +Those threads can be used for management/infrastructure tasks and are used internally by DPDK for multi process support and interrupt handling. + +Those threads will be scheduled on cpus part of the original process cpu affinity from which the dataplane and service lcores are excluded. + +For example, on a 8 cpus system, starting a dpdk application with -l 2,3 (dataplane cores), then depending on the affinity configuration which can be controlled with tools like taskset (Linux) or cpuset (FreeBSD), + +- with no affinity configuration, the Control Threads will end up on 0-1,4-7 cpus. +- with affinity restricted to 2-4, the Control Threads will end up on cpu 4. +- with affinity restricted to 2-3, the Control Threads will end up on cpu 2 (master lcore, which is the default when no cpu is available). + .. _known_issue_label: Known Issues diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index 1f45f82..fca3f83 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -217,6 +217,7 @@ struct device_option { internal_cfg->create_uio_dev = 0; internal_cfg->iova_mode = RTE_IOVA_DC; internal_cfg->user_mbuf_pool_ops_name = NULL; + CPU_ZERO(&internal_cfg->ctrl_cpuset); internal_cfg->init_complete = 0; } @@ -1359,6 +1360,31 @@ static int xdigit2val(unsigned char c) cfg->lcore_count -= removed; } +static void +compute_ctrl_threads_cpuset(struct internal_config *internal_cfg) +{ + rte_cpuset_t *cpuset = &internal_cfg->ctrl_cpuset; + rte_cpuset_t default_set; + unsigned int lcore_id; + + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id+
[dpdk-dev] [PATCH v2] power: fix to remove unused variable
Variable pfi_str is removed since it is unused. Fixes: 450f079131 ("power: add traffic pattern aware power control") Cc: sta...@dpdk.org Signed-off-by: Pallantla Poornima --- v2: Removed unused variable as suggested. --- lib/librte_power/rte_power_empty_poll.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/lib/librte_power/rte_power_empty_poll.c b/lib/librte_power/rte_power_empty_poll.c index e6145462f..15d4f0509 100644 --- a/lib/librte_power/rte_power_empty_poll.c +++ b/lib/librte_power/rte_power_empty_poll.c @@ -156,11 +156,8 @@ update_training_stats(struct priority_worker *poll_stats, { RTE_SET_USED(specific_freq); - char pfi_str[32]; uint64_t p0_empty_deq; - sprintf(pfi_str, "%02d", freq); - if (poll_stats->cur_freq == freq && poll_stats->thresh[freq].trained == false) { if (poll_stats->thresh[freq].cur_train_iter == 0) { -- 2.17.2
Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
Thursday, February 14, 2019 2:22 PM, Alejandro Lucero: >On Wed, Feb 13, 2019 at 7:24 PM Shahaf Shuler wrote: >Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: >> Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for >> external memory >> >> On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler >> wrote: >> >> > This series is in continue to RFC[1]. >> > >> > The DPDK APIs expose 3 different modes to work with memory used for >> DMA: >> > >> > 1. Use the DPDK owned memory (backed by the DPDK provided >> hugepages). >> > This memory is allocated by the DPDK libraries, included in the DPDK >> > memory system (memseg lists) and automatically DMA mapped by the >> DPDK >> > layers. >> > >> > 2. Use memory allocated by the user and register to the DPDK memory >> > systems. This is also referred as external memory. Upon registration >> > of the external memory, the DPDK layers will DMA map it to all needed >> > devices. >> > >> > 3. Use memory allocated by the user and not registered to the DPDK >> > memory system. This is for users who wants to have tight control on >> > this memory. The user will need to explicitly call DMA map function in >> > order to register such memory to the different devices. >> > >> > The scope of the patch focus on #3 above. >> > >> > >> Why can not we have case 2 covering case 3? > >Because it is not our choice rather the DPDK application. >We could not allow it, and force the application to register their external >memory to the DPDK memory management system. However IMO it will be wrong. >The use case exists - some application wants to manage their memory by >themselves. w/o the extra overhead of rte_malloc, without creating a special >socket to populate the memory and without redundant API calls to rte_extmem_*. > >Simply allocate chunk of memory, DMA map it to device and that’s it. > >Usability is a strong point, but up to some extent. DPDK is all about >performance, and adding options the user can choose from will add pressure and >complexity for keeping the performance. Your proposal makes sense from an user >point of view, but will it avoid to modify things in the DPDK core for >supporting this case broadly in the future? Multiprocess will be hard to get, >if not impossible, without adding more complexity, and although you likely do >not expect that use case requiring multiprocess support, once we have DPDK >apps using this model, sooner or later those companies with products based on >such option will demand broadly support. I can foresee not just multiprocess >support will require changes in the future. > >This reminds me the case of obtaining real time: the more complexity the less >determinism can be obtained. It is not impossible, simply it is far more >complex. Pure real time operating systems can add new functionalities, but it >is hard to do it properly without jeopardising the main goal. Generic purpose >operating systems can try to improve determinism, but up to some extent and >with important complexity costs. DPDK is the real time operating system in >this comparison. it makes some sense. as I wrote to Anatoly, I am not against forcing the user to work only w/ DPDK registered memory. we may cause some overhead to application, but will make things less complex. we just need to agree on it, and remove backdoors like vfio_dma_map (which BTW, currently being used by application, checkout VPP). > >> >> >> > Currently the only way to map external memory is through VFIO >> > (rte_vfio_dma_map). While VFIO is common, there are other vendors >> > which use different ways to map memory (e.g. Mellanox and NXP). >> > >> > >> As you say, VFIO is common, and when allowing DMAs programmed in user >> space, the right thing to do. > >It is common indeed. Why it the right thing to do? > >Compared with UIO, for sure. VFIO does have the right view of the system in >terms of which devices can properly be isolated. Can you confirm a specific >implementation by a vendor can ensure same behaviour? If so, do you have >duplicated code then? if the answer is your are using VFIO data, why not to >use VFIO as the interface and add the required connection between VFIO and >drivers? > >What about mapping validation? is the driver doing that part or relying on >kernel code? or is it just assuming the mapping is safe? mapping validation is done by Mellanox kernel module, the kernel is trusted. > > I'm assuming there is an IOMMU hardware and >> this is what Mellanox and NXP rely on in some way or another. > >For Mellanox, the device works with virtual memory, not physical. If you think >of it, it is more secure for user space application. Mellanox device has >internal memory translation unit between virtual memory and physical memory. >IOMMU can be added on top of it, in case the host doesn't trust the device or >the device is given to untrusted entity like VM. > >Any current NIC or device will work with virtual addresses if IOMMU is in >place, not
Re: [dpdk-dev] [PATCH] lib: fix strcat with equivalent logic
On Thu, Feb 14, 2019 at 09:30:31AM +, Chaitanya Babu Talluri wrote: > Replace strcat with concatenation logic to avoid buffer overflow. > > Fixes: a6a47ac9c2 ("cfgfile: rework load function") > Cc: sta...@dpdk.org > > Signed-off-by: Chaitanya Babu Talluri > --- > lib/librte_cfgfile/rte_cfgfile.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_cfgfile/rte_cfgfile.c > b/lib/librte_cfgfile/rte_cfgfile.c > index 7d8c941ea..7616742f7 100644 > --- a/lib/librte_cfgfile/rte_cfgfile.c > +++ b/lib/librte_cfgfile/rte_cfgfile.c > @@ -227,7 +227,15 @@ rte_cfgfile_load_with_params(const char *filename, int > flags, > while (end != NULL) { > if (*(end+1) == params->comment_character) { > *end = '\0'; > - strcat(split[1], end+1); > + int index = strlen(split[1]); > + char *temp = end+1; > + while (*temp != '\0') { > + split[1][index] = *temp; > + index++; > + temp++; > + } > + split[1][index] = '\0'; > + I don't see how this change fixes the problem. From what I see, you have just inlined the strcat function into the code above, without changing any of the logic. To prevent buffer overflows using strcat, you need to identify the total buffer length and use that when copying. I suggest reworking this code to use strlcat. /Bruce
Re: [dpdk-dev] [PATCH 3/6] bus: introduce DMA memory mapping for external memory
On Wed, Feb 13, 2019 at 07:07:11PM +, Shahaf Shuler wrote: > Wednesday, February 13, 2019 1:17 PM, Gaëtan Rivet: > > Subject: Re: [PATCH 3/6] bus: introduce DMA memory mapping for external > > memory > > > > On Wed, Feb 13, 2019 at 11:10:23AM +0200, Shahaf Shuler wrote: > > > The DPDK APIs expose 3 different modes to work with memory used for > > DMA: > > > > > > 1. Use the DPDK owned memory (backed by the DPDK provided > > hugepages). > > > This memory is allocated by the DPDK libraries, included in the DPDK > > > memory system (memseg lists) and automatically DMA mapped by the > > DPDK > > > layers. > > > > > > 2. Use memory allocated by the user and register to the DPDK memory > > > systems. This is also referred as external memory. Upon registration > > > of the external memory, the DPDK layers will DMA map it to all needed > > > devices. > > > > > > 3. Use memory allocated by the user and not registered to the DPDK > > > memory system. This is for users who wants to have tight control on > > > this memory. The user will need to explicitly call DMA map function in > > > order to register such memory to the different devices. > > > > > > The scope of the patch focus on #3 above. > > > > > > Currently the only way to map external memory is through VFIO > > > (rte_vfio_dma_map). While VFIO is common, there are other vendors > > > which use different ways to map memory (e.g. Mellanox and NXP). > > > > > > > How are those other vendors' devices mapped initially right now? Are they > > using #2 scheme instead? Then the user will remap everything using #3? > > It is not a re-map, it is a completely different mode for the memory > management. > The first question to ask is "how the application wants to manage its memory" > ? > If it is either #1 or #2 above, no problem to make the mapping internal on > the "other vendor devices" as they can register to the memory event callback > which trigger every time new memory is added to the DPDK memory management > system. > For #3 the memory does not exists in the DPDK memory management system, and > no memory events. Hence the application needs to explicitly call the dma MAP. > The change on this patch is just to make it more generic than calling only > VFIO. > Right! I mostly used #1 ports and never really thought about other kind of memory management or how they might follow a different logic. Do you think this could be used with a lot of sequential mapping/unmappings happening? I'm thinking for example about a crypto app feeding crypto buffers, being able to directly map the result instead of copying it within buffers might be interesting. But then you'd have to unmap often. - Is the unmap() simple from the app PoV? - Must the mapping remain available for a long time? - Does the app need to call tx_descriptor_status() a few times or does dma_unmap() verify that the mapping is not in use before unmapping? -- Gaëtan Rivet 6WIND
Re: [dpdk-dev] [PATCH 1/2] Enable codespell from config file.
On Wed, Feb 13, 2019 at 07:16:24PM +, Van Haaren, Harry wrote: > > -Original Message- > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Michael Santana > > Sent: Wednesday, February 13, 2019 7:08 PM > > To: dev@dpdk.org > > Cc: Thomas Monjalon > > Subject: [dpdk-dev] [PATCH 1/2] Enable codespell from config file. > > > > Enable turning on codespell from any of the config files for > > checkpatches.sh. codespell is a feature by checkpatch.pl that > > checks for common spelling mistakes in comments in patches. > > > > This feature is disabled by default. To enable it one must add > > the '--codespell' flag to the $options variable in > > checkpatches.sh. With this change the user can decide to turn > > on codespell from a config file rather than directly modifying > > checkpatches.sh > > > > Signed-off-by: Michael Santana > > Oh nice, I didn't know checkpatch had a spell check available. > > Would it make sense to turn on automatically if the required spelling program > is available? > > (Perhaps provide an explicit disable if certain people hate the idea..) > > I'm a +1 for tools just doing the right thing by default :) > +1 for on by default.
Re: [dpdk-dev] [PATCH v5 0/2] Introduce travis support
On 2/7/19 5:01 PM, Michael Santana wrote: ping This series introduces the ability for any github mirrors of the DPDK project, including developer mirrors, to kick off builds under the travis CI infrastructure. For now, this just means compilation - no other kinds of automated run exists yet. In the future, this can be expanded to execute and report results for any test-suites that might exist. The series includes support for both the 'classic make' style builds (which are set to be deprecated at some undetermined point in the future), as well as the modern meson+ninja build system. Additionally, there is support for building the ARM64 target with both the meson and make systems. The files added under .ci/ exist so that in the future, other CI support platforms (such as cirrus, appveyor, etc.) could have a common place to put their requisite scripts without polluting the main tree. Some documentation is updated, making developers aware of the new travis integration. The integration can also be included on the official DPDK mirror. Build reports can be enabled by subscribing the travis build email to the test-reports mailing list (this can be done independent of this series being applied). v4->v5: - Renamed ARM64 to AARCH64. v3->v4: - Remove non-existing file form maintainers list: meson_cross_aarch64_gcc.txt - Renamed ARM64 to AARCH64 for travis environment variable v2->v3: - Removed duplicate file meson_cross_aarch64_gcc.txt. Used arm64_armv8_linuxapp_gcc file instead - Renamed ambiguous variable names and comments, including the variable KERNEL to DISABLE_KERNEL_MODULES and comment 'source for python' to 'Repo for python' - Removed an already-defined variable v1 -> v2: - Added patch 1/2, "examples/vhost_scsi: don't build..." - Included arm64 builds - Included multiple meson+ninja builds (full library, minimal library) - Included multiple 'classic make' builds Aaron Conole (1): examples/vhost_scsi: Don't build without virtio_scsi.h Michael Santana (1): ci: Introduce travis builds for github repositories .ci/linux-build.sh | 88 +++ .ci/linux-setup.sh | 31 ++ .travis.yml | 159 MAINTAINERS | 6 ++ doc/guides/contributing/patches.rst | 4 + examples/vhost_scsi/meson.build | 5 + 6 files changed, 293 insertions(+) create mode 100755 .ci/linux-build.sh create mode 100755 .ci/linux-setup.sh create mode 100644 .travis.yml
Re: [dpdk-dev] [PATCH v2] power: fix to remove unused variable
Hi, I would suggest to consider to add: Fixes: 450f0791312c ("power: add traffic pattern aware power control") Reviewed-By: Rami Rosen
[dpdk-dev] [PATCH 0/5] display testpmd forwarding engine stats on the fly
Here is a little series that makes it possible to display and clear testpmd fwd engines while they run without having to stop them. This is mostly handy when running stress tests and you look for packets drops without having to stop/start testpmd forwarding. Example: testpmd> show fwd stats all --- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 1/Queue= 0 --- RX-packets: 261977064 TX-packets: 261977064 TX-dropped: 0 --- Forward Stats for RX Port= 1/Queue= 0 -> TX Port= 0/Queue= 0 --- RX-packets: 261985142 TX-packets: 261985142 TX-dropped: 0 -- Forward statistics for port 0 -- RX-packets: 261977096 RX-dropped: 0 RX-total: 261977096 TX-packets: 261985155 TX-dropped: 0 TX-total: 261985155 -- Forward statistics for port 1 -- RX-packets: 261985188 RX-dropped: 0 RX-total: 261985188 TX-packets: 261977128 TX-dropped: 0 TX-total: 261977128 +++ Accumulated forward statistics for all ports+++ RX-packets: 523962284 RX-dropped: 0 RX-total: 523962284 TX-packets: 523962283 TX-dropped: 0 TX-total: 523962283 testpmd> show fwd stats all --- Forward Stats for RX Port= 0/Queue= 0 -> TX Port= 1/Queue= 0 --- RX-packets: 274293770 TX-packets: 274293642 TX-dropped: 128 --- Forward Stats for RX Port= 1/Queue= 0 -> TX Port= 0/Queue= 0 --- RX-packets: 274301850 TX-packets: 274301850 TX-dropped: 0 -- Forward statistics for port 0 -- RX-packets: 274293802 RX-dropped: 0 RX-total: 274293802 TX-packets: 274301862 TX-dropped: 0 TX-total: 274301862 -- Forward statistics for port 1 -- RX-packets: 274301894 RX-dropped: 0 RX-total: 274301894 TX-packets: 274293706 TX-dropped: 128 TX-total: 274293834 +++ Accumulated forward statistics for all ports+++ RX-packets: 548595696 RX-dropped: 0 RX-total: 548595696 TX-packets: 548595568 TX-dropped: 128 TX-total: 548595696 David Marchand (5): app/testpmd: remove unused fwd_ctx field app/testpmd: add missing newline when showing statistics app/testpmd: add missing transmit errors stats app/testpmd: remove useless casts on statistics app/testpmd: display/clear forwarding stats on demand app/test-pmd/cmdline.c | 44 +++ app/test-pmd/testpmd.c | 451 ++-- app/test-pmd/testpmd.h | 22 +- doc/guides/testpmd_app_ug/testpmd_funcs.rst | 36 +++ 4 files changed, 316 insertions(+), 237 deletions(-) -- 1.8.3.1
[dpdk-dev] [PATCH 1/5] app/testpmd: remove unused fwd_ctx field
Remove some leftover from a previous rework. Fixes: c4bcc342c8ee ("app/testpmd: refactor ieee1588 forwarding") Signed-off-by: David Marchand --- app/test-pmd/testpmd.h | 1 - 1 file changed, 1 deletion(-) diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index fa48878..85b791b 100644 --- a/app/test-pmd/testpmd.h +++ b/app/test-pmd/testpmd.h @@ -173,7 +173,6 @@ struct rte_port { uint16_ttunnel_tso_segsz; /**< Segmentation offload MSS for tunneled pkts. */ uint16_ttx_vlan_id;/**< The tag ID */ uint16_ttx_vlan_id_outer;/**< The outer tag ID */ - void*fwd_ctx; /**< Forwarding mode context */ uint64_trx_bad_ip_csum; /**< rx pkts with bad ip checksum */ uint64_trx_bad_l4_csum; /**< rx pkts with bad l4 checksum */ uint64_trx_bad_outer_l4_csum; -- 1.8.3.1
[dpdk-dev] [PATCH 2/5] app/testpmd: add missing newline when showing statistics
Having the standard stats and the rx burst stats on the same line gives a really long line and is not consistent with the rest. Before: RX-packets: 3542977TX-packets: 3542971TX-dropped: 6 RX-bursts : 499440 [24% of 2 pkts + 15% of 1 pkts + 61% of others] TX-bursts : 499440 [24% of 2 pkts + 15% of 1 pkts + 61% of others] After: RX-packets: 4629969TX-packets: 4629969TX-dropped: 0 RX-bursts : 663328 [19% of 2 pkts + 17% of 3 pkts + 64% of others] TX-bursts : 663328 [19% of 2 pkts + 17% of 3 pkts + 64% of others] Signed-off-by: David Marchand --- app/test-pmd/testpmd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 98c1baa..984155a 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -1459,7 +1459,7 @@ struct extmem_param { "TX Port=%2d/Queue=%2d %s\n", fwd_top_stats_border, fs->rx_port, fs->rx_queue, fs->tx_port, fs->tx_queue, fwd_top_stats_border); - printf(" RX-packets: %-14u TX-packets: %-14u TX-dropped: %-14u", + printf(" RX-packets: %-14u TX-packets: %-14u TX-dropped: %-14u\n", fs->rx_packets, fs->tx_packets, fs->fwd_dropped); /* if checksum mode */ -- 1.8.3.1
[dpdk-dev] [PATCH 4/5] app/testpmd: remove useless casts on statistics
Caught by code review while investigating the stats display code. Switching all port and fwd engine statistics to uint64_t makes it possible to drop all casts. Signed-off-by: David Marchand --- app/test-pmd/testpmd.c | 62 ++ app/test-pmd/testpmd.h | 12 +- 2 files changed, 28 insertions(+), 46 deletions(-) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 3acd97b..ab110b0 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -1376,7 +1376,7 @@ struct extmem_param { printf(" RX-packets: %-14"PRIu64" RX-dropped: %-14"PRIu64"RX-total: " "%-"PRIu64"\n", stats->ipackets, stats->imissed, - (uint64_t) (stats->ipackets + stats->imissed)); + stats->ipackets + stats->imissed); if (cur_fwd_eng == &csum_fwd_engine) printf(" Bad-ipcsum: %-14"PRIu64" Bad-l4csum: %-14"PRIu64"Bad-outer-l4csum: %-14"PRIu64"\n", @@ -1390,13 +1390,13 @@ struct extmem_param { printf(" TX-packets: %-14"PRIu64" TX-dropped: %-14"PRIu64"TX-total: " "%-"PRIu64"\n", stats->opackets, port->tx_dropped, - (uint64_t) (stats->opackets + port->tx_dropped)); + stats->opackets + port->tx_dropped); } else { printf(" RX-packets: %14"PRIu64" RX-dropped:%14"PRIu64"RX-total:" "%14"PRIu64"\n", stats->ipackets, stats->imissed, - (uint64_t) (stats->ipackets + stats->imissed)); + stats->ipackets + stats->imissed); if (cur_fwd_eng == &csum_fwd_engine) printf(" Bad-ipcsum:%14"PRIu64" Bad-l4csum:%14"PRIu64"Bad-outer-l4csum: %-14"PRIu64"\n", @@ -1411,7 +1411,7 @@ struct extmem_param { printf(" TX-packets: %14"PRIu64" TX-dropped:%14"PRIu64"TX-total:" "%14"PRIu64"\n", stats->opackets, port->tx_dropped, - (uint64_t) (stats->opackets + port->tx_dropped)); + stats->opackets + port->tx_dropped); } #ifdef RTE_TEST_PMD_RECORD_BURST_STATS @@ -1459,13 +1459,15 @@ struct extmem_param { "TX Port=%2d/Queue=%2d %s\n", fwd_top_stats_border, fs->rx_port, fs->rx_queue, fs->tx_port, fs->tx_queue, fwd_top_stats_border); - printf(" RX-packets: %-14u TX-packets: %-14u TX-dropped: %-14u\n", + printf(" RX-packets: %-14"PRIu64" TX-packets: %-14"PRIu64 + " TX-dropped: %-14"PRIu64"\n", fs->rx_packets, fs->tx_packets, fs->fwd_dropped); /* if checksum mode */ if (cur_fwd_eng == &csum_fwd_engine) { - printf(" RX- bad IP checksum: %-14u Rx- bad L4 checksum: " - "%-14u Rx- bad outer L4 checksum: %-14u\n", + printf(" RX- bad IP checksum: %-14"PRIu64 + " Rx- bad L4 checksum: %-14"PRIu64 + " Rx- bad outer L4 checksum: %-14"PRIu64"\n", fs->rx_bad_ip_csum, fs->rx_bad_l4_csum, fs->rx_bad_outer_l4_csum); } @@ -1743,9 +1745,6 @@ struct extmem_param { uint64_t total_rx_dropped; uint64_t total_tx_dropped; uint64_t total_rx_nombuf; - uint64_t tx_dropped; - uint64_t rx_bad_ip_csum; - uint64_t rx_bad_l4_csum; #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES uint64_t fwd_cycles; #endif @@ -1772,42 +1771,25 @@ struct extmem_param { fwd_cycles = 0; #endif for (sm_id = 0; sm_id < cur_fwd_config.nb_fwd_streams; sm_id++) { + struct fwd_stream *fs = fwd_streams[sm_id]; + if (cur_fwd_config.nb_fwd_streams > cur_fwd_config.nb_fwd_ports) { fwd_stream_stats_display(sm_id); - ports[fwd_streams[sm_id]->tx_port].tx_stream = NULL; - ports[fwd_streams[sm_id]->rx_port].rx_stream = NULL; + ports[fs->tx_port].tx_stream = NULL; + ports[fs->rx_port].rx_stream = NULL; } else { - ports[fwd_streams[sm_id]->tx_port].tx_stream = - fwd_streams[sm_id]; - ports[fwd_streams[sm_id]->rx_port].rx_stream = - fwd_streams[sm_id]; - } - tx_dropped = ports[fwd_streams[sm_id]->tx_port].tx_dropped; - tx_dropped = (uint64_t) (tx_dropped + -fwd_streams[sm_id]->fwd_dropped); - ports[fwd_streams[sm_id]->tx_port].tx_dropped = tx_dropped; - - rx_bad_ip_csum = - ports[fwd_streams[
[dpdk-dev] [PATCH 5/5] app/testpmd: display/clear forwarding stats on demand
Add a new "show/clear fwd stats all" command to display fwd and port statistics on the fly. To be able to do so, rte_port can't be used to maintain any statistics. Moved the stats dump parts from stop_packet_forwarding() and merge with fwd_port_stats_display() into fwd_stats_display(). fwd engine statistics are then aggregated into a local per port array. Signed-off-by: David Marchand --- app/test-pmd/cmdline.c | 44 +++ app/test-pmd/testpmd.c | 426 +++- app/test-pmd/testpmd.h | 9 +- doc/guides/testpmd_app_ug/testpmd_funcs.rst | 36 +++ 4 files changed, 306 insertions(+), 209 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index db53cc0..58b11b7 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -7386,6 +7386,49 @@ struct cmd_showqueue_result { }, }; +/* show/clear fwd engine statistics */ +struct fwd_result { + cmdline_fixed_string_t action; + cmdline_fixed_string_t fwd; + cmdline_fixed_string_t stats; + cmdline_fixed_string_t all; +}; + +cmdline_parse_token_string_t cmd_fwd_action = + TOKEN_STRING_INITIALIZER(struct fwd_result, action, "show#clear"); +cmdline_parse_token_string_t cmd_fwd_fwd = + TOKEN_STRING_INITIALIZER(struct fwd_result, fwd, "fwd"); +cmdline_parse_token_string_t cmd_fwd_stats = + TOKEN_STRING_INITIALIZER(struct fwd_result, stats, "stats"); +cmdline_parse_token_string_t cmd_fwd_all = + TOKEN_STRING_INITIALIZER(struct fwd_result, all, "all"); + +static void +cmd_fwd_parsed(void *parsed_result, + __rte_unused struct cmdline *cl, + __rte_unused void *data) +{ + struct fwd_result *res = parsed_result; + + if (!strcmp(res->action, "show")) + fwd_stats_display(); + else + fwd_stats_reset(); +} + +static cmdline_parse_inst_t cmd_fwdall = { + .f = cmd_fwd_parsed, + .data = NULL, + .help_str = "show|clear fwd stats all", + .tokens = { + (void *)&cmd_fwd_action, + (void *)&cmd_fwd_fwd, + (void *)&cmd_fwd_stats, + (void *)&cmd_fwd_all, + NULL, + }, +}; + /* *** READ PORT REGISTER *** */ struct cmd_read_reg_result { cmdline_fixed_string_t read; @@ -18559,6 +18602,7 @@ struct cmd_show_tx_metadata_result { (cmdline_parse_inst_t *)&cmd_showqueue, (cmdline_parse_inst_t *)&cmd_showportall, (cmdline_parse_inst_t *)&cmd_showcfg, + (cmdline_parse_inst_t *)&cmd_fwdall, (cmdline_parse_inst_t *)&cmd_start, (cmdline_parse_inst_t *)&cmd_start_tx_first, (cmdline_parse_inst_t *)&cmd_start_tx_first_n, diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index ab110b0..e70ad20 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -1361,91 +1361,6 @@ struct extmem_param { #endif /* RTE_TEST_PMD_RECORD_BURST_STATS */ static void -fwd_port_stats_display(portid_t port_id, struct rte_eth_stats *stats) -{ - struct rte_port *port; - uint8_t i; - - static const char *fwd_stats_border = "--"; - - port = &ports[port_id]; - printf("\n %s Forward statistics for port %-2d %s\n", - fwd_stats_border, port_id, fwd_stats_border); - - if ((!port->rx_queue_stats_mapping_enabled) && (!port->tx_queue_stats_mapping_enabled)) { - printf(" RX-packets: %-14"PRIu64" RX-dropped: %-14"PRIu64"RX-total: " - "%-"PRIu64"\n", - stats->ipackets, stats->imissed, - stats->ipackets + stats->imissed); - - if (cur_fwd_eng == &csum_fwd_engine) - printf(" Bad-ipcsum: %-14"PRIu64" Bad-l4csum: %-14"PRIu64"Bad-outer-l4csum: %-14"PRIu64"\n", - port->rx_bad_ip_csum, port->rx_bad_l4_csum, - port->rx_bad_outer_l4_csum); - if ((stats->ierrors + stats->rx_nombuf) > 0) { - printf(" RX-error: %-"PRIu64"\n", stats->ierrors); - printf(" RX-nombufs: %-14"PRIu64"\n", stats->rx_nombuf); - } - - printf(" TX-packets: %-14"PRIu64" TX-dropped: %-14"PRIu64"TX-total: " - "%-"PRIu64"\n", - stats->opackets, port->tx_dropped, - stats->opackets + port->tx_dropped); - } - else { - printf(" RX-packets: %14"PRIu64" RX-dropped:%14"PRIu64"RX-total:" - "%14"PRIu64"\n", - stats->ipackets, stats->imissed, - stats->ipackets + stats->imissed); - - if (cur_fwd_eng == &csum_fwd_engine) - printf(" Bad-ipcsum:%14"PRIu64" Bad-l4csum:%14"PRIu64"Bad-outer-l4csum: %-14"PRIu64"\n", -
[dpdk-dev] [PATCH 3/5] app/testpmd: add missing transmit errors stats
pmd can report transmit errors but those stats are not accounted here. Signed-off-by: David Marchand --- app/test-pmd/testpmd.c | 1 + 1 file changed, 1 insertion(+) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 984155a..3acd97b 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -1838,6 +1838,7 @@ struct extmem_param { total_recv += stats.ipackets; total_xmit += stats.opackets; total_rx_dropped += stats.imissed; + port->tx_dropped += stats.oerrors; total_tx_dropped += port->tx_dropped; total_rx_nombuf += stats.rx_nombuf; -- 1.8.3.1
Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
On 14-Feb-19 1:28 PM, Shahaf Shuler wrote: Thursday, February 14, 2019 12:19 PM, Burakov, Anatoly: Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory On 13-Feb-19 7:24 PM, Shahaf Shuler wrote: Wednesday, February 13, 2019 1:43 PM, Alejandro Lucero: Subject: Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory On Wed, Feb 13, 2019 at 9:11 AM Shahaf Shuler wrote: This series is in continue to RFC[1]. The DPDK APIs expose 3 different modes to work with memory used for DMA: 1. Use the DPDK owned memory (backed by the DPDK provided hugepages). This memory is allocated by the DPDK libraries, included in the DPDK memory system (memseg lists) and automatically DMA mapped by the DPDK layers. 2. Use memory allocated by the user and register to the DPDK memory systems. This is also referred as external memory. Upon registration of the external memory, the DPDK layers will DMA map it to all needed devices. 3. Use memory allocated by the user and not registered to the DPDK memory system. This is for users who wants to have tight control on this memory. The user will need to explicitly call DMA map function in order to register such memory to the different devices. The scope of the patch focus on #3 above. Why can not we have case 2 covering case 3? Because it is not our choice rather the DPDK application. We could not allow it, and force the application to register their external memory to the DPDK memory management system. However IMO it will be wrong. The use case exists - some application wants to manage their memory by themselves. w/o the extra overhead of rte_malloc, without creating a special socket to populate the memory and without redundant API calls to rte_extmem_*. Simply allocate chunk of memory, DMA map it to device and that’s it. Just a small note: while this sounds good on paper, i should point out that at least *registering* the memory with DPDK is a necessity. You may see rte_extmem_* calls as redundant (and i agree, to an extent), but we don't advertise our PMD's capabilities in a way that makes it easy to determine whether a particular PMD will or will not work without registering external memory within DPDK (i.e. does it use rte_virt2memseg() internally, for example). So, extmem register calls are a necessary evil in such case, and IMO should be called out as required for such external memory usage scenario. If we are going to force all to use the extmem, then there is no need w/ this API. we can have the PMDs to register when the memory is registered. We can just drop the vfio_dma_map APIs and that's it. Well, whether we needed it or not is not really my call, but what i can say is that using extmem_register is _necessary_ if you're going to use the PMD's. You're right, we could just map memory for DMA at register time - that would save one API call to get the memory working. It makes it a bit weird semantically, but i think we can live with that :) -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH v2 1/2] eal: fix potential incorrect pinning for ctrl threads
On 14-Feb-19 1:30 PM, David Marchand wrote: pthread_setaffinity_np returns a >0 value on error. We could end up letting the ctrl threads on the current process cpu affinity. Fixes: d651ee4919cd ("eal: set affinity for control threads") Signed-off-by: David Marchand --- lib/librte_eal/common/eal_common_thread.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c index 48ef4d6..a3985ce 100644 --- a/lib/librte_eal/common/eal_common_thread.c +++ b/lib/librte_eal/common/eal_common_thread.c @@ -209,7 +209,7 @@ static void *rte_thread_init(void *arg) CPU_SET(rte_get_master_lcore(), &cpuset); ret = pthread_setaffinity_np(*thread, sizeof(cpuset), &cpuset); - if (ret < 0) + if (ret) goto fail; ret = pthread_barrier_wait(¶ms->configured); CC: stable? -- Thanks, Anatoly
[dpdk-dev] [PATCH] ethdev: remove unused variable
When removing the old attach function, the racy variable for getting the last port id became unused. Fixes: c9cce42876f5 ("ethdev: remove deprecated attach/detach functions") Cc: sta...@dpdk.org Signed-off-by: Thomas Monjalon --- lib/librte_ethdev/rte_ethdev.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index 3b1d1a560..a2b63951f 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -48,7 +48,6 @@ int rte_eth_dev_logtype; static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data"; struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS]; -static uint16_t eth_dev_last_created_port; /* spinlock for eth device callbacks */ static rte_spinlock_t rte_eth_dev_cb_lock = RTE_SPINLOCK_INITIALIZER; @@ -445,8 +444,6 @@ eth_dev_get(uint16_t port_id) eth_dev->data = &rte_eth_dev_shared_data->data[port_id]; - eth_dev_last_created_port = port_id; - return eth_dev; } -- 2.20.1
Re: [dpdk-dev] [PATCH 3/5] app/testpmd: add missing transmit errors stats
On Thu, Feb 14, 2019 at 04:42:50PM +0100, David Marchand wrote: > pmd can report transmit errors but those stats are not accounted here. > > Signed-off-by: David Marchand > --- > app/test-pmd/testpmd.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c > index 984155a..3acd97b 100644 > --- a/app/test-pmd/testpmd.c > +++ b/app/test-pmd/testpmd.c > @@ -1838,6 +1838,7 @@ struct extmem_param { > total_recv += stats.ipackets; > total_xmit += stats.opackets; > total_rx_dropped += stats.imissed; > + port->tx_dropped += stats.oerrors; > total_tx_dropped += port->tx_dropped; > total_rx_nombuf += stats.rx_nombuf; > > Without knowing as to whether the line is needed or not, the line itself looks out of place. All other lines are assignments to local variables, apart from this. Should a local variable be defined for consistency? /Bruce
Re: [dpdk-dev] [PATCH 0/6] introduce DMA memory mapping for external memory
On 14-Feb-19 1:41 PM, Shahaf Shuler wrote: Thursday, February 14, 2019 2:22 PM, Alejandro Lucero: >Any current NIC or device will work with virtual addresses if IOMMU is in place, not matter if the device is IOMMU-aware or not. Not sure what you mean here. For example Intel devices works w/ VFIO and use iova to provide buffers to NIC. hence protection between multiple process is by application responsibility or new VFIO container. As far as VFIO is concerned, "multiprocess protection" is not a thing, because the device cannot be used twice in the first place - each usage is strictly limited to one VFIO container. We just sidestep this "limitation" by sharing container/device file descriptors with multiple processes via IPC. So while it's technically true that multiprocess protection is "application responsibility" because we can pass around fd's, it's still protected by the kernel. IOVA mappings are per-container, so the same IOVA range can be mapped twice (thrice...), as long as it's for a different set of devices, in effect making them virtual addresses. -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH] eal: fix check when retrieving current cpu affinity
On 14-Feb-19 1:27 PM, David Marchand wrote: pthread_getaffinity_np returns a >0 value when failing. This is mainly for the sake of correctness. The only case where it could fail is when passing an incorrect cpuset size wrt to the kernel. Fixes: 2eba8d21f3c9 ("eal: restrict cores auto detection") Signed-off-by: David Marchand --- lib/librte_eal/common/eal_common_options.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c index 6c96f45..1f45f82 100644 --- a/lib/librte_eal/common/eal_common_options.c +++ b/lib/librte_eal/common/eal_common_options.c @@ -1343,10 +1343,9 @@ static int xdigit2val(unsigned char c) unsigned int lcore_id; unsigned int removed = 0; rte_cpuset_t affinity_set; - pthread_t tid = pthread_self(); - if (pthread_getaffinity_np(tid, sizeof(rte_cpuset_t), - &affinity_set) < 0) + if (pthread_getaffinity_np(pthread_self(), sizeof(rte_cpuset_t), + &affinity_set)) CPU_ZERO(&affinity_set); for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { CC: stable? -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH] ethdev: remove unused variable
On 2/14/19 7:29 PM, Thomas Monjalon wrote: When removing the old attach function, the racy variable for getting the last port id became unused. Fixes: c9cce42876f5 ("ethdev: remove deprecated attach/detach functions") Cc: sta...@dpdk.org Signed-off-by: Thomas Monjalon Reviewed-by: Andrew Rybchenko
Re: [dpdk-dev] [PATCH 3/5] app/testpmd: add missing transmit errors stats
On Thu, Feb 14, 2019 at 5:30 PM Bruce Richardson wrote: > On Thu, Feb 14, 2019 at 04:42:50PM +0100, David Marchand wrote: > > pmd can report transmit errors but those stats are not accounted here. > > > > Signed-off-by: David Marchand > > --- > > app/test-pmd/testpmd.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c > > index 984155a..3acd97b 100644 > > --- a/app/test-pmd/testpmd.c > > +++ b/app/test-pmd/testpmd.c > > @@ -1838,6 +1838,7 @@ struct extmem_param { > > total_recv += stats.ipackets; > > total_xmit += stats.opackets; > > total_rx_dropped += stats.imissed; > > + port->tx_dropped += stats.oerrors; > > total_tx_dropped += port->tx_dropped; > > total_rx_nombuf += stats.rx_nombuf; > > > > > Without knowing as to whether the line is needed or not, the line itself > looks out of place. All other lines are assignments to local variables, > apart from this. Should a local variable be defined for consistency? > Indeed this looks wrong to add it to port->tx_dropped. It actually "works" since this part is called when stopping forwarding and port->tx_dropped gets reset later when starting forwarding again. I suppose I should move this to total_tx_dropped instead. Thanks Bruce. -- David Marchand
Re: [dpdk-dev] [PATCH] eal: fix check when retrieving current cpu affinity
On Thu, Feb 14, 2019 at 5:44 PM Burakov, Anatoly wrote: > On 14-Feb-19 1:27 PM, David Marchand wrote: > > pthread_getaffinity_np returns a >0 value when failing. > > > > This is mainly for the sake of correctness. > > The only case where it could fail is when passing an incorrect cpuset > > size wrt to the kernel. > > > > Fixes: 2eba8d21f3c9 ("eal: restrict cores auto detection") > > Signed-off-by: David Marchand > > --- > > lib/librte_eal/common/eal_common_options.c | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/lib/librte_eal/common/eal_common_options.c > b/lib/librte_eal/common/eal_common_options.c > > index 6c96f45..1f45f82 100644 > > --- a/lib/librte_eal/common/eal_common_options.c > > +++ b/lib/librte_eal/common/eal_common_options.c > > @@ -1343,10 +1343,9 @@ static int xdigit2val(unsigned char c) > > unsigned int lcore_id; > > unsigned int removed = 0; > > rte_cpuset_t affinity_set; > > - pthread_t tid = pthread_self(); > > > > - if (pthread_getaffinity_np(tid, sizeof(rte_cpuset_t), > > - &affinity_set) < 0) > > + if (pthread_getaffinity_np(pthread_self(), sizeof(rte_cpuset_t), > > + &affinity_set)) > > CPU_ZERO(&affinity_set); > > > > for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { > > > CC: stable? > Not really sure about CCing stable for this. I did not get errors with pthread_getaffinity_np so far. Afaiu, it would need the kernel and libc to have different cpuset sizes. I did not investigate in which situations it could happen. -- David Marchand
Re: [dpdk-dev] [PATCH v2 1/2] eal: fix potential incorrect pinning for ctrl threads
On Thu, Feb 14, 2019 at 5:12 PM Burakov, Anatoly wrote: > On 14-Feb-19 1:30 PM, David Marchand wrote: > > pthread_setaffinity_np returns a >0 value on error. > > We could end up letting the ctrl threads on the current process cpu > > affinity. > > > > Fixes: d651ee4919cd ("eal: set affinity for control threads") > > Signed-off-by: David Marchand > > --- > > lib/librte_eal/common/eal_common_thread.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/lib/librte_eal/common/eal_common_thread.c > b/lib/librte_eal/common/eal_common_thread.c > > index 48ef4d6..a3985ce 100644 > > --- a/lib/librte_eal/common/eal_common_thread.c > > +++ b/lib/librte_eal/common/eal_common_thread.c > > @@ -209,7 +209,7 @@ static void *rte_thread_init(void *arg) > > CPU_SET(rte_get_master_lcore(), &cpuset); > > > > ret = pthread_setaffinity_np(*thread, sizeof(cpuset), &cpuset); > > - if (ret < 0) > > + if (ret) > > goto fail; > > > > ret = pthread_barrier_wait(¶ms->configured); > > > > CC: stable? > Yes, I again forgot to check with check-git-log.sh... -- David Marchand
[dpdk-dev] [PATCH] eal/linux: fix log levels for unable to read pagemap
Commit cdc242f260e7 says: For Linux kernel 4.0 and newer, the ability to obtain physical page frame numbers for unprivileged users from /proc/self/pagemap was removed. Instead, when an IOMMU is present, simply choose our own DMA addresses instead. In this case the user still sees error messages, so adjust the log levels. Later, other checks will ensure that errors are logged in the appropriate cases. Fixes: cdc242f260e7 ("eal/linux: support running as unprivileged user") Cc: sta...@dpdk.org Signed-off-by: Kevin Traynor --- lib/librte_eal/linuxapp/eal/eal_memory.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index 1b96b576e..f6ee403ad 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -115,5 +115,5 @@ rte_mem_virt2phy(const void *virtaddr) fd = open("/proc/self/pagemap", O_RDONLY); if (fd < 0) { - RTE_LOG(ERR, EAL, "%s(): cannot open /proc/self/pagemap: %s\n", + RTE_LOG(INFO, EAL, "%s(): cannot open /proc/self/pagemap: %s\n", __func__, strerror(errno)); return RTE_BAD_IOVA; @@ -123,5 +123,5 @@ rte_mem_virt2phy(const void *virtaddr) offset = sizeof(uint64_t) * virt_pfn; if (lseek(fd, offset, SEEK_SET) == (off_t) -1) { - RTE_LOG(ERR, EAL, "%s(): seek error in /proc/self/pagemap: %s\n", + RTE_LOG(INFO, EAL, "%s(): seek error in /proc/self/pagemap: %s\n", __func__, strerror(errno)); close(fd); @@ -132,9 +132,9 @@ rte_mem_virt2phy(const void *virtaddr) close(fd); if (retval < 0) { - RTE_LOG(ERR, EAL, "%s(): cannot read /proc/self/pagemap: %s\n", + RTE_LOG(INFO, EAL, "%s(): cannot read /proc/self/pagemap: %s\n", __func__, strerror(errno)); return RTE_BAD_IOVA; } else if (retval != PFN_MASK_SIZE) { - RTE_LOG(ERR, EAL, "%s(): read %d bytes from /proc/self/pagemap " + RTE_LOG(INFO, EAL, "%s(): read %d bytes from /proc/self/pagemap " "but expected %d:\n", __func__, retval, PFN_MASK_SIZE); -- 2.20.1
Re: [dpdk-dev] 17.11.5-rc1 patches review and test
Hi, I'm afraid there is a problem with the strlcpy change in the nfp code. Using the rte_strlcpy makes the fw upload failing. I will send a patch asap using memcpy instead. On Mon, Jan 7, 2019 at 8:57 PM Yongseok Koh wrote: > Hi all, > > Here is a list of patches targeted for LTS release 17.11.5. Please help > review > and test RC1. The planned date for the final release is 18 Jan 2019 unless > there's need for RC2. Before that, please shout if anyone has objections > with > these patches being applied. > > Also for the companies committed to running regression tests, please run > the > tests and report any issue before the release date. > > The release candidate tarballs can be found at: > > https://dpdk.org/browse/dpdk-stable/tag/?id=v17.11.5-rc1 > > These patches are located at branch 17.11 of dpdk-stable repo: > > https://dpdk.org/browse/dpdk-stable/ > > > Thanks, > Yongseok > > --- > Ajit Khaparde (1): > net/bnxt: set MAC filtering as outer for non tunnel frames > > Alejandro Lucero (4): > mem: fix memory initialization time > net/nfp: fix mbuf flags with checksum good > net/nfp: fix RSS > bus/pci: compare kernel driver instead of interrupt handler > > Anatoly Burakov (2): > mem: fix undefined behavior in NUMA-aware mapping > usertools: check for lspci dependency > > Andrew Rybchenko (2): > net/sfc/base: fix build because of no declaration > net/sfc: receive prepared packets even in Rx exception case > > Andy Green (37): > eal: explicit cast of builtin for bsf32 > eal: explicit cast of core id when getting index > eal: declare trace buffer at top of own block > spinlock/x86: move stack declaration before code > net: move stack variable at top of VLAN strip function > ethdev: explicit cast of buffered Tx number > hash: move stack declaration at top of CRC32c function > hash: explicit casts for truncation in CRC32c > bus/pci: replace strncpy by strlcpy > bus/dpaa: fix inconsistent struct alignment > net/nfp: fix memcpy out of source range > net/qede: replace strncpy by strlcpy > net/qede: fix strncpy > net/sfc: make sure that stats name is nul-terminated > app/procinfo: fix sprintf overrun > devtools: provide more generic grep in git check > eal: explicit cast of strlcpy return > eal: fix casts in random functions > eal: explicit cast in constant byte swap > ring: remove useless variables > ring: remove signed type flip-flopping > mbuf: fix reference counter integer promotion > mbuf: explicit casts of reference counter > mbuf: explicit cast of headroom on reset > mbuf: explicit cast of size on detach > net: explicit cast of multicast bit clearing > net: explicit cast of IP checksum to 16-bit > net: explicit cast of protocol in IPv6 checksum > ethdev: explicit cast of queue count return > eal/x86: fix type of variable in memcpy function > eal: explicit cast in rwlock functions > net: explicit cast in L4 checksum > mbuf: fix type of private size in detach > mbuf: fix type of variables in linearize function > mbuf: avoid implicit demotion in 64-bit arithmetic > mbuf: avoid integer promotion in prepend/adj/chain > ethdev: fix type and scope of variables in Rx burst > > Andy Moreton (3): > net/sfc/base: properly align on line continuation > net/sfc/base: fix MAC Tx stats for less or equal to 64 bytes > net/sfc/base: fix ID retrieval in v3 licensing > > Anoob Joseph (3): > app/test-crypto-perf: fix check for auth key > app/test-crypto-perf: fix check for cipher IV > app/test-crypto-perf: fix double allocation of memory > > Bei Sun (1): > net/bnxt: set VLAN strip mode before default VNIC cfg > > Beilei Xing (2): > net/i40e: update Tx offload mask > net/i40e: fix X710 Rx after reading some registers > > Brian Archbold (1): > app/testpmd: fix duplicate exit > > Bruce Richardson (1): > eal: support strlcpy function > > Chaitanya Babu Talluri (1): > efd: fix write unlock during ring creation > > Chas Williams (1): > net/bonding: fix Rx slave fairness > > Dharmik Thakkar (1): > test/hash: fix build > > Didier Pallard (1): > net: fix Intel prepare function for IP checksum offload > > Evgeny Im (1): > net/failsafe: remove not supported multicast MAC filter > > Fan Zhang (1): > bus/pci: fix config r/w access > > Ferruh Yigit (5): > eal: fix build with gcc 9.0 > test/reorder: fix out of bound access > net/i40e/base: fix comment referencing internal data > bus/pci: fix allocation of device path > kni: fix build on Linux 4.19 > > Fiona Trahe (1): > test/crypto: fix number of queue pairs > > Haiyue Wang (1): > net/i40e: enable loopback function for X722 MAC > > Harry van Haaren (1): > even
[dpdk-dev] DPDK Release Status Meeting 14/2/2019
Minutes 14 February 2019 Agenda: * Release Dates * Subtrees * GSoC * OvS * Meson-Ninja * Opens Participants: * Arm * Debian * Intel * Mellanox * RedHat Release Dates - * v19.05 dates: * Proposal Deadline: Friday, 1 March * RC1: Friday, 29 March * Release: Friday, 10 May * Schedule web page updated accordingly: http://core.dpdk.org/roadmap/#dates * Contributors please provide 19.05 roadmap whenever available * Mellanox, Arm & Intel sent, RedHat? NXP? Marvel? others? Subtrees * main * Merged some patches and pulled sub-trees * next-net * Some patches pushed and merged into main repo * Small number of patches awaiting merge * next-virtio * Some patches pushed and merged into next-net * next-crypto * next-eventdev * next-pipeline * next-qos * No update, we need attendance from maintainers * Stable trees * Stable tree RC waiting for test, anyone from community welcome to test them send a test report or even a quick note to mail list: * LTS: v16.11.9-rc2 * LTS: v17.11.5-rc2 * Stable: v18.08.1-rc3 * Reminder, each stable managed as a branch on stable git tree: https://git.dpdk.org/dpdk-stable/ * 17.11.5 merged one more fix * 17.11.5 will be released without waiting more testing * 16.11.9 is the last release of 16.11, will wait for more test * 18.11 will create an RC next week GSoC * Google Summer of Code * Some good suggestions sent to the list * https://mails.dpdk.org/archives/web/2019-February/001025.html OvS --- * A week away for 2.11 release * Patch for using meson build system for pkg-config * David found issue with EAL threads. Regression came in 18.05 Meson-Ninja --- * Debian and Canonical are using it * PDF build not working * Thomas seeing some issues with cross-compilation and non-standard dependencies * Windows trying to using * RedHat still using make. RHEL 7 doesn't have support. RHEL 8 does * There is a target to switch to meson build on 19.11 Opens - * Akhil is looking for co-maintainer for crypto sub-tree, any volunteer please contact to Akhil or Thomas. * Propose 19.08 dates next week DPDK Release Status Meetings The DPDK Release Status Meeting is intended for DPDK Committers to discuss the status of the master tree and sub-trees, and for project managers to track progress or milestone dates. The meeting occurs on Thursdays at 8:30 UTC. If you wish to attend just send an email to "John McNamara " for the invite.
Re: [dpdk-dev] [PATCH 3/5] app/testpmd: add missing transmit errors stats
On Thu, Feb 14, 2019 at 6:39 PM David Marchand wrote: > On Thu, Feb 14, 2019 at 5:30 PM Bruce Richardson < > bruce.richard...@intel.com> wrote: > >> On Thu, Feb 14, 2019 at 04:42:50PM +0100, David Marchand wrote: >> > pmd can report transmit errors but those stats are not accounted here. >> > >> > Signed-off-by: David Marchand >> > --- >> > app/test-pmd/testpmd.c | 1 + >> > 1 file changed, 1 insertion(+) >> > >> > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c >> > index 984155a..3acd97b 100644 >> > --- a/app/test-pmd/testpmd.c >> > +++ b/app/test-pmd/testpmd.c >> > @@ -1838,6 +1838,7 @@ struct extmem_param { >> > total_recv += stats.ipackets; >> > total_xmit += stats.opackets; >> > total_rx_dropped += stats.imissed; >> > + port->tx_dropped += stats.oerrors; >> > total_tx_dropped += port->tx_dropped; >> > total_rx_nombuf += stats.rx_nombuf; >> > >> > >> Without knowing as to whether the line is needed or not, the line itself >> looks out of place. All other lines are assignments to local variables, >> apart from this. Should a local variable be defined for consistency? >> > > Thinking again about this oerrors stats... We had a discussion with Maxime, last week. So I want to make sure that what we both agreed makes sense :-) What is the purpose of oerrors ? Since the drivers (via rte_eth_tx_burst return value) report the numbers of packets successfully transmitted, the application can try to retransmit the packets that did not make it and counts this. If the driver counts such "missed" packets, then it does the job the application will do anyway (wasting some cycles). But what is more a problem is that the application does not know if the packets in oerrors are its own retries or problems that the driver can not detect (hw problems) but the hw can. So the best option is that oerrors does not report the packets the driver refuses (and I can see some drivers that do not comply to this) but only "external" errors from the driver pov. Back to my patch here, if we agree on this definition of oerrors, I can not add it to total_tx_dropped, but I suppose I can add some "TX HW errors: " and "Total TX HW errors: " logs so that we are aware that something went bad "further" than the driver. Let's sleep on it :-) -- David Marchand
[dpdk-dev] [PATCH v2] net/bonding: fix slave tx burst for mode 4
From: Chas Williams The tx burst routine always needs to check for pending LACPDUs and send them if available. Do this first to prioritize the control traffic. We can still early exit, before calculating the distribution slaves, if there isn't any data packets. Fixes: 09150784a776 ("net/bonding: burst mode hash calculation") Cc: sta...@dpdk.org Reported-by: Hui Zhao Cc: Yunjian Wang Signed-off-by: Chas Williams --- drivers/net/bonding/rte_eth_bond_pmd.c | 48 +- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index 2304172d0..61e731a8f 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -1298,9 +1298,6 @@ bond_ethdev_tx_burst_8023ad(void *queue, struct rte_mbuf **bufs, uint16_t i; - if (unlikely(nb_bufs == 0)) - return 0; - /* Copy slave list to protect against slave up/down changes during tx * bursting */ slave_count = internals->active_slave_count; @@ -1310,6 +1307,30 @@ bond_ethdev_tx_burst_8023ad(void *queue, struct rte_mbuf **bufs, memcpy(slave_port_ids, internals->active_slaves, sizeof(slave_port_ids[0]) * slave_count); + /* Check for LACP control packets and send if available */ + for (i = 0; i < slave_count; i++) { + struct port *port = &bond_mode_8023ad_ports[slave_port_ids[i]]; + struct rte_mbuf *ctrl_pkt = NULL; + + if (likely(rte_ring_empty(port->tx_ring))) + continue; + + if (rte_ring_dequeue(port->tx_ring, +(void **)&ctrl_pkt) != -ENOENT) { + slave_tx_count = rte_eth_tx_burst(slave_port_ids[i], + bd_tx_q->queue_id, &ctrl_pkt, 1); + /* +* re-enqueue LAG control plane packets to buffering +* ring if transmission fails so the packet isn't lost. +*/ + if (slave_tx_count != 1) + rte_ring_enqueue(port->tx_ring, ctrl_pkt); + } + } + + if (unlikely(nb_bufs == 0)) + return 0; + dist_slave_count = 0; for (i = 0; i < slave_count; i++) { struct port *port = &bond_mode_8023ad_ports[slave_port_ids[i]]; @@ -1365,27 +1386,6 @@ bond_ethdev_tx_burst_8023ad(void *queue, struct rte_mbuf **bufs, } } - /* Check for LACP control packets and send if available */ - for (i = 0; i < slave_count; i++) { - struct port *port = &bond_mode_8023ad_ports[slave_port_ids[i]]; - struct rte_mbuf *ctrl_pkt = NULL; - - if (likely(rte_ring_empty(port->tx_ring))) - continue; - - if (rte_ring_dequeue(port->tx_ring, -(void **)&ctrl_pkt) != -ENOENT) { - slave_tx_count = rte_eth_tx_burst(slave_port_ids[i], - bd_tx_q->queue_id, &ctrl_pkt, 1); - /* -* re-enqueue LAG control plane packets to buffering -* ring if transmission fails so the packet isn't lost. -*/ - if (slave_tx_count != 1) - rte_ring_enqueue(port->tx_ring, ctrl_pkt); - } - } - return total_tx_count; } -- 2.17.2
[dpdk-dev] [PATCH] net/bonding: fix invalid link status
From: Chas Williams Copying the link properties of the first slave added may copy an invalid link status. The speed and duplex of the slave may not be known at this time. Delay setting the properties until the first slave reports as link up. Note that we are still ignoring an error from link_properties_valid. For some bonding modes, 802.3ad, we should not activate the slave if it does not have matching link properties. Fixes: a45b288ef21a ("bond: support link status polling") Cc: sta...@dpdk.org Signed-off-by: Chas Williams --- drivers/net/bonding/rte_eth_bond_api.c | 4 --- drivers/net/bonding/rte_eth_bond_pmd.c | 31 +- drivers/net/bonding/rte_eth_bond_private.h | 7 - 3 files changed, 18 insertions(+), 24 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c index e5e146540..57ef2f001 100644 --- a/drivers/net/bonding/rte_eth_bond_api.c +++ b/drivers/net/bonding/rte_eth_bond_api.c @@ -484,10 +484,6 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, uint16_t slave_port_id) } } - /* Inherit eth dev link properties from first slave */ - link_properties_set(bonded_eth_dev, - &(slave_eth_dev->data->dev_link)); - /* Make primary slave */ internals->primary_port = slave_port_id; internals->current_primary_port = slave_port_id; diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index 61e731a8f..c4a2b955c 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -1449,7 +1449,7 @@ bond_ethdev_tx_burst_broadcast(void *queue, struct rte_mbuf **bufs, return max_nb_of_tx_pkts; } -void +static void link_properties_set(struct rte_eth_dev *ethdev, struct rte_eth_link *slave_link) { struct bond_dev_private *bond_ctx = ethdev->data->dev_private; @@ -1474,7 +1474,7 @@ link_properties_set(struct rte_eth_dev *ethdev, struct rte_eth_link *slave_link) } } -int +static int link_properties_valid(struct rte_eth_dev *ethdev, struct rte_eth_link *slave_link) { @@ -2693,16 +2693,6 @@ bond_ethdev_lsc_event_callback(uint16_t port_id, enum rte_eth_event_type type, if (active_pos < internals->active_slave_count) goto link_update; - /* if no active slave ports then set this port to be primary port */ - if (internals->active_slave_count < 1) { - /* If first active slave, then change link status */ - bonded_eth_dev->data->dev_link.link_status = ETH_LINK_UP; - internals->current_primary_port = port_id; - lsc_flag = 1; - - mac_address_slaves_update(bonded_eth_dev); - } - /* check link state properties if bonded link is up*/ if (bonded_eth_dev->data->dev_link.link_status == ETH_LINK_UP) { if (link_properties_valid(bonded_eth_dev, &link) != 0) @@ -2714,9 +2704,24 @@ bond_ethdev_lsc_event_callback(uint16_t port_id, enum rte_eth_event_type type, link_properties_set(bonded_eth_dev, &link); } + /* If no active slave ports then set this port to be +* the primary port. +*/ + if (internals->active_slave_count < 1) { + /* If first active slave, then change link status */ + bonded_eth_dev->data->dev_link.link_status = + ETH_LINK_UP; + internals->current_primary_port = port_id; + lsc_flag = 1; + + mac_address_slaves_update(bonded_eth_dev); + } + activate_slave(bonded_eth_dev, port_id); - /* If user has defined the primary port then default to using it */ + /* If the user has defined the primary port then default to +* using it. +*/ if (internals->user_defined_primary_port && internals->primary_port == port_id) bond_ethdev_primary_set(internals, port_id); diff --git a/drivers/net/bonding/rte_eth_bond_private.h b/drivers/net/bonding/rte_eth_bond_private.h index 3ea5d686b..032ffed02 100644 --- a/drivers/net/bonding/rte_eth_bond_private.h +++ b/drivers/net/bonding/rte_eth_bond_private.h @@ -222,13 +222,6 @@ deactivate_slave(struct rte_eth_dev *eth_dev, uint16_t port_id); void activate_slave(struct rte_eth_dev *eth_dev, uint16_t port_id); -void -link_properties_set(struct rte_eth_dev *bonded_eth_dev, - struct rte_eth_link *slave_dev_link); -int -link_properties_valid(struct rte_
Re: [dpdk-dev] [PATCH] eal/linux: fix log levels for unable to read pagemap
Kevin Traynor writes: > Commit cdc242f260e7 says: > For Linux kernel 4.0 and newer, the ability to obtain > physical page frame numbers for unprivileged users from > /proc/self/pagemap was removed. Instead, when an IOMMU > is present, simply choose our own DMA addresses instead. > > In this case the user still sees error messages, so adjust > the log levels. Later, other checks will ensure that errors > are logged in the appropriate cases. > > Fixes: cdc242f260e7 ("eal/linux: support running as unprivileged user") > Cc: sta...@dpdk.org > > Signed-off-by: Kevin Traynor > --- Acked-by: Aaron Conole Thanks, Kevin!
[dpdk-dev] [PATCH v2 2/2] Fix variable assignment.
Fix trivial bug. In sh shell, 'foo = 1' is not the same as 'foo=1'. Using 'foo = 1' makes the shell attempt to interpret foo as a command, rather than a simple variable assignment. Signed-off-by: Michael Santana Fixes: dafc04c15174 ("hash: fix out-of-bound write while freeing key slot") --- v2: Nothing changed since v1. devtools/checkpatches.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh index 9c2b0a28a..8852b9412 100755 --- a/devtools/checkpatches.sh +++ b/devtools/checkpatches.sh @@ -68,7 +68,7 @@ check_forbidden_additions() { # -v RET_ON_FAIL=1 \ -v MESSAGE='Using explicit .svg extension instead of .*' \ -f $(dirname $(readlink -e $0))/check-forbidden-tokens.awk \ - "$1" || res = 1 + "$1" || res=1 return $res } -- 2.20.1
[dpdk-dev] [PATCH v2 1/2] Enable codespell by default. Can be disabled from config file.
Enable codespell by default. codespell is a feature by checkpatch.pl that checks for common spelling mistakes in patches. This feature is disabled by default. To enable it one must add the '--codespell' flag to the $options variable in checkpatches.sh. With this change codespell is enabled by default. The user can decide to turn off codespell from a one of the config files read by checkpatches.sh. Signed-off-by: Michael Santana Reviewed-by: Rami Rosen --- devtools/checkpatches.sh | 6 ++ 1 file changed, 6 insertions(+) diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh index 3b03b7ef2..9c2b0a28a 100755 --- a/devtools/checkpatches.sh +++ b/devtools/checkpatches.sh @@ -2,9 +2,12 @@ # SPDX-License-Identifier: BSD-3-Clause # Copyright 2015 6WIND S.A. +# Enable codespell by default. This can be overwritten from a config file. +DPDK_CHECKPATCH_CODESPELL=enable # Load config options: # - DPDK_CHECKPATCH_PATH # - DPDK_CHECKPATCH_LINE_LENGTH +# - DPDK_CHECKPATCH_CODESPELL . $(dirname $(readlink -e $0))/load-devel-config VALIDATE_NEW_API=$(dirname $(readlink -e $0))/check-symbol-change.sh @@ -13,6 +16,9 @@ length=${DPDK_CHECKPATCH_LINE_LENGTH:-80} # override default Linux options options="--no-tree" +if [ "$DPDK_CHECKPATCH_CODESPELL" == "enable" ]; then +options="$options --codespell" +fi options="$options --max-line-length=$length" options="$options --show-types" options="$options --ignore=LINUX_VERSION_CODE,\ -- 2.20.1
[dpdk-dev] [PATCH v2 0/2] Minor changes to checkpatches
Fixed a minor bug with variable assignment, as well as added an option for checkpatches v1->v2: Enable codespell by default. Disable via config file Michael Santana (2): Enable codespell by default. Can be disabled from config file. Fix variable assignment. devtools/checkpatches.sh | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) -- 2.20.1