Re: [dpdk-dev] [PATCH v2] igb_uio: add config option to control reset
On Nov 4, 2017 01:03, "Ferruh Yigit" wrote: On 11/3/2017 12:42 PM, Roberts, Lee A. wrote: > > >> -Original Message- >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Tan, Jianfeng >> Sent: Thursday, November 02, 2017 8:57 PM >> To: Ferruh Yigit ; Thomas Monjalon < tho...@monjalon.net> >> Cc: dev@dpdk.org; sta...@dpdk.org; Jingjing Wu ; Shijith Thotton >> ; Gregory Etelson ; Harish Patil >> ; George Prekas ; Sergio Gonzalez Monroy >> ; Rasesh Mody >> Subject: Re: [dpdk-dev] [PATCH v2] igb_uio: add config option to control reset >> >> >> >> On 11/3/2017 8:51 AM, Ferruh Yigit wrote: >>> Adding a compile time configuration option to control device reset done >>> during DPDK application exit. >>> >>> Config option is CONFIG_RTE_EAL_IGB_UIO_RESET and enabled by default, >>> so by default reset will happen. Having this reset is safer to be sure >>> device left in a proper case. >>> >>> But for special cases [1] it is possible to disable the config option >>> to prevent the device reset. >>> >>> [1] >>> http://dpdk.org/ml/archives/dev/2017-November/080927.html >>> >>> Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of device file") >>> Cc: sta...@dpdk.org >>> >>> Signed-off-by: Ferruh Yigit >> >> Realize that we do have a pci_clear_master() in the release() to disable >> the DMA from device until the next open() will enable the DMA again . >> Here is my: >> >> Reviewed-by: Jianfeng Tan >> >> Thanks, >> Jianfeng >> >>> --- >>> Cc: Jianfeng Tan >>> Cc: Jingjing Wu >>> Cc: Shijith Thotton >>> Cc: Gregory Etelson >>> Cc: Harish Patil >>> Cc: George Prekas >>> Cc: Sergio Gonzalez Monroy >>> Cc: Rasesh Mody >>> >>> v2: >>> * fix typo in commit log >>> --- >>> config/common_base| 1 + >>> config/common_linuxapp| 1 + >>> lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 2 ++ >>> 3 files changed, 4 insertions(+) >>> >>> diff --git a/config/common_base b/config/common_base >>> index 82ee75456..2a9947420 100644 >>> --- a/config/common_base >>> +++ b/config/common_base >>> @@ -102,6 +102,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n >>> CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n >>> CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n >>> CONFIG_RTE_EAL_IGB_UIO=n >>> +CONFIG_RTE_EAL_IGB_UIO_RESET=n >>> CONFIG_RTE_EAL_VFIO=n >>> CONFIG_RTE_MALLOC_DEBUG=n >>> CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n >>> diff --git a/config/common_linuxapp b/config/common_linuxapp >>> index 74c7d64ec..b3a602909 100644 >>> --- a/config/common_linuxapp >>> +++ b/config/common_linuxapp >>> @@ -37,6 +37,7 @@ CONFIG_RTE_EXEC_ENV_LINUXAPP=y >>> >>> CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=y >>> CONFIG_RTE_EAL_IGB_UIO=y >>> +CONFIG_RTE_EAL_IGB_UIO_RESET=y >>> CONFIG_RTE_EAL_VFIO=y >>> CONFIG_RTE_KNI_KMOD=y >>> CONFIG_RTE_LIBRTE_KNI=y >>> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c >>> index fd320d87d..0325722c0 100644 >>> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c >>> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c >>> @@ -360,7 +360,9 @@ igbuio_pci_release(struct uio_info *info, struct inode *inode) >>> /* stop the device from further DMA */ >>> pci_clear_master(dev); >>> >>> +#ifdef RTE_EAL_IGB_UIO_RESET >>> pci_reset_function(dev); >>> +#endif >>> >>> return 0; >>> } > > A compile time configuration option makes life very difficult for application providers. > > Consider the case where an application such as Open vSwitch with DPDK support is being provided > with a Linux distribution. One would want the Open vSwitch binary to support as many vendor NICs > as possible---without the need to recompile. With a change such as this, one would need to have > different versions of the kernel igb_uio module to support different NICs. Agreed, I am against adding more compile time options although I am end up sending a few of them these days. > The Linux kernel is already aware of, and provides work-arounds for, various PCI quirks. > For example, see linux/drivers/pci/quirks.c (http://elixir.free-electrons. com/linux/latest/source/drivers/pci/quirks.c). > > At this point in igb_uio.c, one is aware of the struct pci_dev "dev" for the device in question. > Access to the vendor and device information should be simple: > > struct pci_dev { > struct list_head bus_list;/* node in per-bus list */ > struct pci_bus*bus;/* bus this device is on */ > struct pci_bus*subordinate;/* bus this device bridges to */ > > void*sysdata;/* hook for sys-specific extension */ > struct proc_dir_entry *procent;/* device entry in /proc/bus/pci */ > struct pci_slot*slot;/* Physical slot this device is in */ > > unsigned intdevfn;/* encoded device & function index */ > unsigned shortvendor; > unsigned shortdevice; > unsigned shortsubsystem_vendor; > unsigned shortsubsystem_device; > ... > > One could imagine using logic to implement corresponding PCI quirks that can be evaluated > at runtime. For example (in pseudocode), > > if not (vendor
[dpdk-dev] [PATCH] net/kni: remove eth_kni_drv declaration
This patch removes the forward declaration of eth_kni_drv in rte_eth_kni.c; this forward declaration was made unnecessary by commit 050fe6e9ff970ff92d842912136be8f9f52e171f ("drivers/net: use ethdev allocation helper for vdev"), which removes the usage of eth_kni_drv in the eth_kni_create() method. Signed-off-by: Rami Rosen --- drivers/net/kni/rte_eth_kni.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/kni/rte_eth_kni.c b/drivers/net/kni/rte_eth_kni.c index d68ff7a..e31b909 100644 --- a/drivers/net/kni/rte_eth_kni.c +++ b/drivers/net/kni/rte_eth_kni.c @@ -358,8 +358,6 @@ struct pmd_internals { .stats_reset = eth_kni_stats_reset, }; -static struct rte_vdev_driver eth_kni_drv; - static struct rte_eth_dev * eth_kni_create(struct rte_vdev_device *vdev, struct eth_kni_args *args, -- 1.9.1
[dpdk-dev] [PATCH] lib: fix a typo in kni header file (doxygen)
This patch fixes a trivial typo in rte_kni.h header file (librte_kni). Signed-off-by: Rami Rosen --- lib/librte_kni/rte_kni.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_kni/rte_kni.h b/lib/librte_kni/rte_kni.h index 87812cd..d195079 100644 --- a/lib/librte_kni/rte_kni.h +++ b/lib/librte_kni/rte_kni.h @@ -118,7 +118,7 @@ struct rte_kni_conf { * elements for each KNI interface allocated. * * @param pktmbuf_pool - * The mempool for allocting mbufs for packets. + * The mempool for allocating mbufs for packets. * @param conf * The pointer to the configurations of the KNI device. * @param ops -- 1.9.1
Re: [dpdk-dev] [PATCH v1] i40e: highlight change to flexible payload for RSS config
> -Original Message- > From: Mcnamara, John > Sent: Friday, November 3, 2017 3:24 PM > To: dev@dpdk.org > Cc: Rybalchenko, Kirill ; Chilikin, Andrey > ; Mcnamara, John > Subject: [PATCH v1] i40e: highlight change to flexible payload for RSS config > > Add deprecation notice to highlight a future change in the > default configuration behavior for flexible payload for RSS. > > Signed-off-by: John McNamara > --- > doc/guides/rel_notes/deprecation.rst | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/doc/guides/rel_notes/deprecation.rst > b/doc/guides/rel_notes/deprecation.rst > index a93c3e1..9e9caa6 100644 > --- a/doc/guides/rel_notes/deprecation.rst > +++ b/doc/guides/rel_notes/deprecation.rst > @@ -67,3 +67,9 @@ Deprecation Notices > > * librte_meter: The API will change to accommodate configuration profiles. >Most of the API functions will have an additional opaque parameter. > + > +* i40e: The default flexible payload configuration which extracts the first > 16 > + bytes of the payload for RSS will be deprecated starting from 18.02. If > + required the previous behavior can be configured using existing flow > + director APIs. There is no ABI/API break. This change will just remove a > + global configuration setting and require explicit configuration. > -- > 2.7.5 Acked-by: Andrey Chilikin
Re: [dpdk-dev] Running DPDK as an unprivileged user
Hi, restarting an old topic, 05/01/2017 16:52, Tan, Jianfeng: > On 1/5/2017 5:34 AM, Walker, Benjamin wrote: > >>> Note that this > >>> probably means that using uio on recent kernels is subtly > >>> broken and cannot be supported going forward because there > >>> is no uio mechanism to pin the memory. > >>> > >>> The first open question I have is whether DPDK should allow > >>> uio at all on recent (4.x) kernels. My current understanding > >>> is that there is no way to pin memory and hugepages can now > >>> be moved around, so uio would be unsafe. What does the > >>> community think here? > > Back to this question, removing uio support in DPDK seems a little > overkill to me. Can we just document it down? Like, firstly warn users > do not invoke migrate_pages() or move_pages() to a DPDK process; as for > the kcompactd daemon and some more cases (like compaction could be > triggered by alloc_pages()), could we just recommend to disable > CONFIG_COMPACTION? We really need to better document the limitations of UIO. May we have some suggestions here? > Another side, how does vfio pin those memory? Through memlock (from code > in vfio_pin_pages())? So why not just mlock those hugepages? Good question. Why not mlock the hugepages?
Re: [dpdk-dev] [PATCH 2/3] net/mlx4: adjust removal error
Hi Adrien, Thanks for the review :) Please see below comments. > -Original Message- > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com] > Sent: Friday, November 3, 2017 3:06 PM > To: Matan Azrad > Cc: Gaetan Rivet ; dev@dpdk.org > Subject: Re: [PATCH 2/3] net/mlx4: adjust removal error > > On Thu, Nov 02, 2017 at 03:42:03PM +, Matan Azrad wrote: > > Fail-safe PMD expects to get -ENODEV error value if sub PMD control > > command fails because of device removal. > > > > Make control callbacks return with -ENODEV when the device has > > disappeared. > > > > Signed-off-by: Matan Azrad > > I think there are a several inconsistencies regarding the places where > mlx4_removed() is used, this could lead to mistakes or redundant calls to this > function later on. > > You have to choose between low-level internal functions (e.g. > mlx4_set_sysfs_ulong()) or user-facing ones from the eth_dev_ops > interface (e.g. mlx4_dev_set_link_up()), but neither intermediate functions > nor a mix of all approaches. You are touching here, exactly in one of my design thoughts: Either using always "low" level error adjustments or using always high level adjustments. The high level approach does less reuse of code but simpler to maintain (as you said). I decided to combine the two approaches while never going to the lowest level code(ibv, pipes). Adding the check in mlx4_dev_set_link() can replace two checks: in mlx4_dev_set_link_up() and mlx4_dev_set_link_down(). Adding the check in mlx4_flow_toggle()can replace many checks: all flows callbacks and also mlx4_mac_addr_add(),mlx4_mac_addr_set(). You right regarding mlx4_set_sysfs_ulong() it can be replaced by check in mlx4_mtu_set() - will fix it in V2. You right regarding mlx4_ifreq(), it can be replaced by check in in mlx4_link_update() - - will fix it in V2. I can understand the consistency approach but I think the above two cases to be in lower level functions are harmless and reuse code. What do you think? > > Standardizing on low-level functions is not practical as it means you'd have > to > check for a device removal after each ibv_*() call. Therefore my suggestion is > to check it at the highest level, in all functions exposed though > mlx4_dev_ops in case of error, even innocuous one like > mlx4_stats_get() and those returning void (rte_errno can still be set), all in > the name of consistency. > If everything OK with the callback (even in a removal case) why to set rte_errno? Specifically in mlx4_stats_get() has no error flow and we don't want error return in case of removal since we can provide stats even after removal (SW counters) and this is a good "feature" for failsafe plug out saving stats process. > The mlx4_removed() documentation should be updated to reflect the places > it's supposed to be called as well. All this means a larger patch is > necessary. > Do you mean documentation in code(comment) or mlx4 docs, maybe both? > See below for coding style issues. > > > --- > > drivers/net/mlx4/mlx4.h| 1 + > > drivers/net/mlx4/mlx4_ethdev.c | 38 > ++ > > drivers/net/mlx4/mlx4_flow.c | 2 ++ > > drivers/net/mlx4/mlx4_intr.c | 5 - > > drivers/net/mlx4/mlx4_rxq.c| 1 + > > drivers/net/mlx4/mlx4_txq.c| 1 + > > 6 files changed, 43 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index > > e0a9853..cac9654 100644 > > --- a/drivers/net/mlx4/mlx4.h > > +++ b/drivers/net/mlx4/mlx4.h > > @@ -149,6 +149,7 @@ int mlx4_flow_ctrl_get(struct rte_eth_dev *dev, > >struct rte_eth_fc_conf *fc_conf); int > > mlx4_flow_ctrl_set(struct rte_eth_dev *dev, > >struct rte_eth_fc_conf *fc_conf); > > +int mlx4_removed(const struct priv *priv); > > > > /* mlx4_intr.c */ > > > > diff --git a/drivers/net/mlx4/mlx4_ethdev.c > > b/drivers/net/mlx4/mlx4_ethdev.c index b0acd12..76914b0 100644 > > --- a/drivers/net/mlx4/mlx4_ethdev.c > > +++ b/drivers/net/mlx4/mlx4_ethdev.c > > @@ -312,6 +312,8 @@ > > > > ret = mlx4_sysfs_write(priv, name, value_str, (sizeof(value_str) - 1)); > > if (ret < 0) { > > + if (mlx4_removed(priv)) > > + ret = -ENODEV; > > DEBUG("cannot write %s `%s' (%lu) to sysfs: %s", > > name, value_str, value, strerror(rte_errno)); > > return ret; > > @@ -340,15 +342,19 @@ > > > > if (sock == -1) { > > rte_errno = errno; > > - return -rte_errno; > > + goto error; > > } > > ret = mlx4_get_ifname(priv, &ifr->ifr_name); > > if (!ret && ioctl(sock, req, ifr) == -1) { > > rte_errno = errno; > > - ret = -rte_errno; > > + close(sock); > > + goto error; > > } > > close(sock); > > return ret; > > +error: > > + mlx4_removed(priv); > > + return -rte_errno; > > } > > > > /** > > @@ -473,13 +479,17 @@ > >
Re: [dpdk-dev] [PATCH 3/3] net/mlx5: adjust removal error
Hi Adrien, Thanks for this too. > -Original Message- > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com] > Sent: Friday, November 3, 2017 3:06 PM > To: Matan Azrad > Cc: Gaetan Rivet ; dev@dpdk.org > Subject: Re: [PATCH 3/3] net/mlx5: adjust removal error > > On Thu, Nov 02, 2017 at 03:42:04PM +, Matan Azrad wrote: > > Fail-safe PMD expects to get -ENODEV error value if sub PMD control > > command fails because of device removal. > > > > Make control callbacks return with -ENODEV when the device has > > disappeared. > > > > Signed-off-by: Matan Azrad > > In short I have the same comments as on the mlx4 patch about usage > consistency, this also applies to mlx5; mlx5_removed() should be only used > by the public callbacks from struct eth_dev_ops. > > There's an additional difficulty with this PMD, you need to take into account > the fact it provides secondary process support (mlx5_dev_sec_ops). > I think secondary processes do not have any IBV context available for > mlx5_removed() to query, which should resolve to a no-op in this case. > Make sure secondary processes do not crash whatever happens. > Will check it, thanks! > See below for coding style and other issues. > > > --- > > drivers/net/mlx5/mlx5.h| 1 + > > drivers/net/mlx5/mlx5_ethdev.c | 39 > +++ > > drivers/net/mlx5/mlx5_flow.c | 2 ++ > > drivers/net/mlx5/mlx5_rss.c| 4 > > drivers/net/mlx5/mlx5_rxq.c| 12 ++-- > > drivers/net/mlx5/mlx5_stats.c | 6 +- > > drivers/net/mlx5/mlx5_txq.c| 2 ++ > > 7 files changed, 59 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index > > e6a69b8..0dd104a 100644 > > --- a/drivers/net/mlx5/mlx5.h > > +++ b/drivers/net/mlx5/mlx5.h > > @@ -208,6 +208,7 @@ int mlx5_ibv_device_to_pci_addr(const struct > > ibv_device *, int mlx5_set_link_up(struct rte_eth_dev *dev); void > > priv_dev_select_tx_function(struct priv *priv, struct rte_eth_dev > > *dev); void priv_dev_select_rx_function(struct priv *priv, struct > > rte_eth_dev *dev); > > +int mlx5_removed(const struct priv *priv); > > > > /* mlx5_mac.c */ > > > > diff --git a/drivers/net/mlx5/mlx5_ethdev.c > > b/drivers/net/mlx5/mlx5_ethdev.c index c31ea4b..bf61cd6 100644 > > --- a/drivers/net/mlx5/mlx5_ethdev.c > > +++ b/drivers/net/mlx5/mlx5_ethdev.c > > @@ -394,6 +394,8 @@ struct priv * > > > > ret = priv_sysfs_write(priv, name, value_str, (sizeof(value_str) - 1)); > > if (ret == -1) { > > + if (mlx5_removed(priv)) > > + errno = ENODEV; > > DEBUG("cannot write %s `%s' (%lu) to sysfs: %s", > > name, value_str, value, strerror(errno)); > > return -1; > > @@ -925,13 +927,17 @@ struct priv * > > { > > struct utsname utsname; > > int ver[3]; > > + int ret; > > > > if (uname(&utsname) == -1 || > > sscanf(utsname.release, "%d.%d.%d", > >&ver[0], &ver[1], &ver[2]) != 3 || > > KERNEL_VERSION(ver[0], ver[1], ver[2]) < KERNEL_VERSION(4, 9, > 0)) > > - return mlx5_link_update_unlocked_gset(dev, > wait_to_complete); > > - return mlx5_link_update_unlocked_gs(dev, wait_to_complete); > > + ret = mlx5_link_update_unlocked_gset(dev, > wait_to_complete); > > + ret = mlx5_link_update_unlocked_gs(dev, wait_to_complete); > > Besides the extra space after "ret =", I think this doesn't work as intended. > A > "else" statement is necessary. > Will fix it, thanks! > > + if (ret && mlx5_removed(mlx5_get_priv(dev))) > > + return -ENODEV; > > + return ret; > > } > > > > /** > > @@ -978,6 +984,8 @@ struct priv * > > strerror(ret)); > > priv_unlock(priv); > > assert(ret >= 0); > > + if (mlx5_removed(priv)) > > + return -ENODEV; > > return -ret; > > } > > > > @@ -1029,6 +1037,8 @@ struct priv * > > out: > > priv_unlock(priv); > > assert(ret >= 0); > > + if (mlx5_removed(priv)) > > + return -ENODEV; > > return -ret; > > } > > > > @@ -1083,6 +1093,8 @@ struct priv * > > out: > > priv_unlock(priv); > > assert(ret >= 0); > > + if (mlx5_removed(priv)) > > + return -ENODEV; > > return -ret; > > } > > > > @@ -1364,13 +1376,13 @@ struct priv * > > if (up) { > > err = priv_set_flags(priv, ~IFF_UP, IFF_UP); > > if (err) > > - return err; > > + return errno == ENODEV ? -ENODEV : err; > > There is a documentation issue here since the mlx5 PMD didn't get all the > errno consistency fixes that mlx4 got, however err is documented as being -1 > in case of error, whereas priv_dev_set_link() returns a positive errno value > instead and mlx5_set_link_down/up() should return only negative errno > values but are documented as returning positive ones. > > Anyway to keep it short: currently in mlx5, priv_*() => positive errno and the > pub