Re: [dpdk-dev] [PATCH v2] igb_uio: add config option to control reset

2017-11-04 Thread Stephen Hemminger
On Nov 4, 2017 01:03, "Ferruh Yigit"  wrote:

On 11/3/2017 12:42 PM, Roberts, Lee A. wrote:
>
>
>> -Original Message-
>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Tan, Jianfeng
>> Sent: Thursday, November 02, 2017 8:57 PM
>> To: Ferruh Yigit ; Thomas Monjalon <
tho...@monjalon.net>
>> Cc: dev@dpdk.org; sta...@dpdk.org; Jingjing Wu ;
Shijith Thotton
>> ; Gregory Etelson ;
Harish Patil
>> ; George Prekas ; Sergio
Gonzalez Monroy
>> ; Rasesh Mody 
>> Subject: Re: [dpdk-dev] [PATCH v2] igb_uio: add config option to control
reset
>>
>>
>>
>> On 11/3/2017 8:51 AM, Ferruh Yigit wrote:
>>> Adding a compile time configuration option to control device reset done
>>> during DPDK application exit.
>>>
>>> Config option is CONFIG_RTE_EAL_IGB_UIO_RESET and enabled by default,
>>> so by default reset will happen. Having this reset is safer to be sure
>>> device left in a proper case.
>>>
>>> But for special cases [1] it is possible to disable the config option
>>> to prevent the device reset.
>>>
>>> [1]
>>> http://dpdk.org/ml/archives/dev/2017-November/080927.html
>>>
>>> Fixes: b58eedfc7dd5 ("igb_uio: issue FLR during open and release of
device file")
>>> Cc: sta...@dpdk.org
>>>
>>> Signed-off-by: Ferruh Yigit 
>>
>> Realize that we do have a pci_clear_master() in the release() to disable
>> the DMA from device until the next open() will enable the DMA again .
>> Here is my:
>>
>> Reviewed-by: Jianfeng Tan 
>>
>> Thanks,
>> Jianfeng
>>
>>> ---
>>> Cc: Jianfeng Tan 
>>> Cc: Jingjing Wu 
>>> Cc: Shijith Thotton 
>>> Cc: Gregory Etelson 
>>> Cc: Harish Patil 
>>> Cc: George Prekas 
>>> Cc: Sergio Gonzalez Monroy 
>>> Cc: Rasesh Mody 
>>>
>>> v2:
>>> * fix typo in commit log
>>> ---
>>>   config/common_base| 1 +
>>>   config/common_linuxapp| 1 +
>>>   lib/librte_eal/linuxapp/igb_uio/igb_uio.c | 2 ++
>>>   3 files changed, 4 insertions(+)
>>>
>>> diff --git a/config/common_base b/config/common_base
>>> index 82ee75456..2a9947420 100644
>>> --- a/config/common_base
>>> +++ b/config/common_base
>>> @@ -102,6 +102,7 @@ CONFIG_RTE_LIBEAL_USE_HPET=n
>>>   CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n
>>>   CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
>>>   CONFIG_RTE_EAL_IGB_UIO=n
>>> +CONFIG_RTE_EAL_IGB_UIO_RESET=n
>>>   CONFIG_RTE_EAL_VFIO=n
>>>   CONFIG_RTE_MALLOC_DEBUG=n
>>>   CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
>>> diff --git a/config/common_linuxapp b/config/common_linuxapp
>>> index 74c7d64ec..b3a602909 100644
>>> --- a/config/common_linuxapp
>>> +++ b/config/common_linuxapp
>>> @@ -37,6 +37,7 @@ CONFIG_RTE_EXEC_ENV_LINUXAPP=y
>>>
>>>   CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=y
>>>   CONFIG_RTE_EAL_IGB_UIO=y
>>> +CONFIG_RTE_EAL_IGB_UIO_RESET=y
>>>   CONFIG_RTE_EAL_VFIO=y
>>>   CONFIG_RTE_KNI_KMOD=y
>>>   CONFIG_RTE_LIBRTE_KNI=y
>>> diff --git a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>>> index fd320d87d..0325722c0 100644
>>> --- a/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>>> +++ b/lib/librte_eal/linuxapp/igb_uio/igb_uio.c
>>> @@ -360,7 +360,9 @@ igbuio_pci_release(struct uio_info *info, struct
inode *inode)
>>>   /* stop the device from further DMA */
>>>   pci_clear_master(dev);
>>>
>>> +#ifdef RTE_EAL_IGB_UIO_RESET
>>>   pci_reset_function(dev);
>>> +#endif
>>>
>>>   return 0;
>>>   }
>
> A compile time configuration option makes life very difficult for
application providers.
>
> Consider the case where an application such as Open vSwitch with DPDK
support is being provided
> with a Linux distribution.  One would want the Open vSwitch binary to
support as many vendor NICs
> as possible---without the need to recompile.  With a change such as this,
one would need to have
> different versions of the kernel igb_uio module to support different NICs.

Agreed, I am against adding more compile time options although I am end up
sending a few of them these days.

> The Linux kernel is already aware of, and provides work-arounds for,
various PCI quirks.
> For example, see linux/drivers/pci/quirks.c (http://elixir.free-electrons.
com/linux/latest/source/drivers/pci/quirks.c).
>
> At this point in igb_uio.c, one is aware of the struct pci_dev "dev" for
the device in question.
> Access to the vendor and device information should be simple:
>
> struct pci_dev {
> struct list_head bus_list;/* node in per-bus list */
> struct pci_bus*bus;/* bus this device is on */
> struct pci_bus*subordinate;/* bus this device bridges to */
>
> void*sysdata;/* hook for sys-specific extension */
> struct proc_dir_entry *procent;/* device entry in /proc/bus/pci */
> struct pci_slot*slot;/* Physical slot this device is in */
>
> unsigned intdevfn;/* encoded device & function index */
> unsigned shortvendor;
> unsigned shortdevice;
> unsigned shortsubsystem_vendor;
> unsigned shortsubsystem_device;
> ...
>
> One could imagine using logic to implement corresponding PCI quirks that
can be evaluated
> at runtime.  For example (in pseudocode),
>
>  if not (vendor 

[dpdk-dev] [PATCH] net/kni: remove eth_kni_drv declaration

2017-11-04 Thread Rami Rosen
This patch removes the forward declaration of eth_kni_drv
in rte_eth_kni.c; this forward declaration was made unnecessary
by commit 050fe6e9ff970ff92d842912136be8f9f52e171f 
("drivers/net: use ethdev allocation helper for vdev"), which 
removes the usage of eth_kni_drv in the eth_kni_create() method.

Signed-off-by: Rami Rosen 
---
 drivers/net/kni/rte_eth_kni.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/kni/rte_eth_kni.c b/drivers/net/kni/rte_eth_kni.c
index d68ff7a..e31b909 100644
--- a/drivers/net/kni/rte_eth_kni.c
+++ b/drivers/net/kni/rte_eth_kni.c
@@ -358,8 +358,6 @@ struct pmd_internals {
.stats_reset = eth_kni_stats_reset,
 };
 
-static struct rte_vdev_driver eth_kni_drv;
-
 static struct rte_eth_dev *
 eth_kni_create(struct rte_vdev_device *vdev,
struct eth_kni_args *args,
-- 
1.9.1



[dpdk-dev] [PATCH] lib: fix a typo in kni header file (doxygen)

2017-11-04 Thread Rami Rosen
This patch fixes a trivial typo in rte_kni.h header 
file (librte_kni).

Signed-off-by: Rami Rosen 
---
 lib/librte_kni/rte_kni.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_kni/rte_kni.h b/lib/librte_kni/rte_kni.h
index 87812cd..d195079 100644
--- a/lib/librte_kni/rte_kni.h
+++ b/lib/librte_kni/rte_kni.h
@@ -118,7 +118,7 @@ struct rte_kni_conf {
  * elements for each KNI interface allocated.
  *
  * @param pktmbuf_pool
- *  The mempool for allocting mbufs for packets.
+ *  The mempool for allocating mbufs for packets.
  * @param conf
  *  The pointer to the configurations of the KNI device.
  * @param ops
-- 
1.9.1



Re: [dpdk-dev] [PATCH v1] i40e: highlight change to flexible payload for RSS config

2017-11-04 Thread Chilikin, Andrey
> -Original Message-
> From: Mcnamara, John
> Sent: Friday, November 3, 2017 3:24 PM
> To: dev@dpdk.org
> Cc: Rybalchenko, Kirill ; Chilikin, Andrey
> ; Mcnamara, John 
> Subject: [PATCH v1] i40e: highlight change to flexible payload for RSS config
> 
> Add deprecation notice to highlight a future change in the
> default configuration behavior for flexible payload for RSS.
> 
> Signed-off-by: John McNamara 
> ---
>  doc/guides/rel_notes/deprecation.rst | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index a93c3e1..9e9caa6 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -67,3 +67,9 @@ Deprecation Notices
> 
>  * librte_meter: The API will change to accommodate configuration profiles.
>Most of the API functions will have an additional opaque parameter.
> +
> +* i40e: The default flexible payload configuration which extracts the first 
> 16
> +  bytes of the payload for RSS will be deprecated starting from 18.02. If
> +  required the previous behavior can be configured using existing flow
> +  director APIs. There is no ABI/API break. This change will just remove a
> +  global configuration setting and require explicit configuration.
> --
> 2.7.5

Acked-by: Andrey Chilikin 


Re: [dpdk-dev] Running DPDK as an unprivileged user

2017-11-04 Thread Thomas Monjalon
Hi, restarting an old topic,

05/01/2017 16:52, Tan, Jianfeng:
> On 1/5/2017 5:34 AM, Walker, Benjamin wrote:
> >>> Note that this
> >>> probably means that using uio on recent kernels is subtly
> >>> broken and cannot be supported going forward because there
> >>> is no uio mechanism to pin the memory.
> >>>
> >>> The first open question I have is whether DPDK should allow
> >>> uio at all on recent (4.x) kernels. My current understanding
> >>> is that there is no way to pin memory and hugepages can now
> >>> be moved around, so uio would be unsafe. What does the
> >>> community think here?
> 
> Back to this question, removing uio support in DPDK seems a little 
> overkill to me. Can we just document it down? Like, firstly warn users 
> do not invoke migrate_pages() or move_pages() to a DPDK process; as for 
> the kcompactd daemon and some more cases (like compaction could be 
> triggered by alloc_pages()), could we just recommend to disable 
> CONFIG_COMPACTION?

We really need to better document the limitations of UIO.
May we have some suggestions here?

> Another side, how does vfio pin those memory? Through memlock (from code 
> in vfio_pin_pages())? So why not just mlock those hugepages?

Good question. Why not mlock the hugepages?


Re: [dpdk-dev] [PATCH 2/3] net/mlx4: adjust removal error

2017-11-04 Thread Matan Azrad
Hi Adrien,
Thanks for the review :)

Please see below comments.

> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> Sent: Friday, November 3, 2017 3:06 PM
> To: Matan Azrad 
> Cc: Gaetan Rivet ; dev@dpdk.org
> Subject: Re: [PATCH 2/3] net/mlx4: adjust removal error
> 
> On Thu, Nov 02, 2017 at 03:42:03PM +, Matan Azrad wrote:
> > Fail-safe PMD expects to get -ENODEV error value if sub PMD control
> > command fails because of device removal.
> >
> > Make control callbacks return with -ENODEV when the device has
> > disappeared.
> >
> > Signed-off-by: Matan Azrad 
> 
> I think there are a several inconsistencies regarding the places where
> mlx4_removed() is used, this could lead to mistakes or redundant calls to this
> function later on.
> 
> You have to choose between low-level internal functions (e.g.
> mlx4_set_sysfs_ulong()) or user-facing ones from the eth_dev_ops
> interface (e.g. mlx4_dev_set_link_up()), but neither intermediate functions
> nor a mix of all approaches.

You are touching here, exactly in one of my design thoughts:
Either using always "low" level error adjustments or using always high level  
adjustments.
The high level approach does less reuse of code but simpler to maintain (as you 
said).
I decided to combine the two approaches while never going to the lowest level 
code(ibv, pipes).
Adding the check in mlx4_dev_set_link() can replace two checks: in 
mlx4_dev_set_link_up() and mlx4_dev_set_link_down(). 
Adding the check in mlx4_flow_toggle()can replace many checks: all flows 
callbacks and also mlx4_mac_addr_add(),mlx4_mac_addr_set().
You right regarding  mlx4_set_sysfs_ulong() it can be replaced by check in 
mlx4_mtu_set() - will fix it in V2.
You right regarding  mlx4_ifreq(), it can be replaced by check in in 
mlx4_link_update() - - will fix it in V2.

I can understand the consistency approach but I think the above two cases to be 
in lower level functions are harmless and reuse code.
What do you think?

> 
> Standardizing on low-level functions is not practical as it means you'd have 
> to
> check for a device removal after each ibv_*() call. Therefore my suggestion is
> to check it at the highest level, in all functions exposed though
> mlx4_dev_ops in case of error, even innocuous one like
> mlx4_stats_get() and those returning void (rte_errno can still be set), all in
> the name of consistency.
> 

If everything OK with the callback (even in a removal case) why to set 
rte_errno?
Specifically in mlx4_stats_get() has no error flow and we don't want error 
return in case of removal since we can provide stats even after removal (SW 
counters) and this is a good "feature" for failsafe plug out saving stats 
process.  

> The mlx4_removed() documentation should be updated to reflect the places
> it's supposed to be called as well. All this means a larger patch is 
> necessary.
> 

Do you mean documentation in code(comment) or mlx4 docs, maybe both?

> See below for coding style issues.
> 
> > ---
> >  drivers/net/mlx4/mlx4.h|  1 +
> >  drivers/net/mlx4/mlx4_ethdev.c | 38
> ++
> >  drivers/net/mlx4/mlx4_flow.c   |  2 ++
> >  drivers/net/mlx4/mlx4_intr.c   |  5 -
> >  drivers/net/mlx4/mlx4_rxq.c|  1 +
> >  drivers/net/mlx4/mlx4_txq.c|  1 +
> >  6 files changed, 43 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index
> > e0a9853..cac9654 100644
> > --- a/drivers/net/mlx4/mlx4.h
> > +++ b/drivers/net/mlx4/mlx4.h
> > @@ -149,6 +149,7 @@ int mlx4_flow_ctrl_get(struct rte_eth_dev *dev,
> >struct rte_eth_fc_conf *fc_conf);  int
> > mlx4_flow_ctrl_set(struct rte_eth_dev *dev,
> >struct rte_eth_fc_conf *fc_conf);
> > +int mlx4_removed(const struct priv *priv);
> >
> >  /* mlx4_intr.c */
> >
> > diff --git a/drivers/net/mlx4/mlx4_ethdev.c
> > b/drivers/net/mlx4/mlx4_ethdev.c index b0acd12..76914b0 100644
> > --- a/drivers/net/mlx4/mlx4_ethdev.c
> > +++ b/drivers/net/mlx4/mlx4_ethdev.c
> > @@ -312,6 +312,8 @@
> >
> > ret = mlx4_sysfs_write(priv, name, value_str, (sizeof(value_str) - 1));
> > if (ret < 0) {
> > +   if (mlx4_removed(priv))
> > +   ret = -ENODEV;
> > DEBUG("cannot write %s `%s' (%lu) to sysfs: %s",
> >   name, value_str, value, strerror(rte_errno));
> > return ret;
> > @@ -340,15 +342,19 @@
> >
> > if (sock == -1) {
> > rte_errno = errno;
> > -   return -rte_errno;
> > +   goto error;
> > }
> > ret = mlx4_get_ifname(priv, &ifr->ifr_name);
> > if (!ret && ioctl(sock, req, ifr) == -1) {
> > rte_errno = errno;
> > -   ret = -rte_errno;
> > +   close(sock);
> > +   goto error;
> > }
> > close(sock);
> > return ret;
> > +error:
> > +   mlx4_removed(priv);
> > +   return -rte_errno;
> >  }
> >
> >  /**
> > @@ -473,13 +479,17 @@
> > 

Re: [dpdk-dev] [PATCH 3/3] net/mlx5: adjust removal error

2017-11-04 Thread Matan Azrad
Hi Adrien,
Thanks for this too.

> -Original Message-
> From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com]
> Sent: Friday, November 3, 2017 3:06 PM
> To: Matan Azrad 
> Cc: Gaetan Rivet ; dev@dpdk.org
> Subject: Re: [PATCH 3/3] net/mlx5: adjust removal error
> 
> On Thu, Nov 02, 2017 at 03:42:04PM +, Matan Azrad wrote:
> > Fail-safe PMD expects to get -ENODEV error value if sub PMD control
> > command fails because of device removal.
> >
> > Make control callbacks return with -ENODEV when the device has
> > disappeared.
> >
> > Signed-off-by: Matan Azrad 
> 
> In short I have the same comments as on the mlx4 patch about usage
> consistency, this also applies to mlx5; mlx5_removed() should be only used
> by the public callbacks from struct eth_dev_ops.
> 
> There's an additional difficulty with this PMD, you need to take into account
> the fact it provides secondary process support (mlx5_dev_sec_ops).
> I think secondary processes do not have any IBV context available for
> mlx5_removed() to query, which should resolve to a no-op in this case.
> Make sure secondary processes do not crash whatever happens.
> 
Will check it, thanks!

> See below for coding style and other issues.
> 
> > ---
> >  drivers/net/mlx5/mlx5.h|  1 +
> >  drivers/net/mlx5/mlx5_ethdev.c | 39
> +++
> >  drivers/net/mlx5/mlx5_flow.c   |  2 ++
> >  drivers/net/mlx5/mlx5_rss.c|  4 
> >  drivers/net/mlx5/mlx5_rxq.c| 12 ++--
> >  drivers/net/mlx5/mlx5_stats.c  |  6 +-
> >  drivers/net/mlx5/mlx5_txq.c|  2 ++
> >  7 files changed, 59 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> > e6a69b8..0dd104a 100644
> > --- a/drivers/net/mlx5/mlx5.h
> > +++ b/drivers/net/mlx5/mlx5.h
> > @@ -208,6 +208,7 @@ int mlx5_ibv_device_to_pci_addr(const struct
> > ibv_device *,  int mlx5_set_link_up(struct rte_eth_dev *dev);  void
> > priv_dev_select_tx_function(struct priv *priv, struct rte_eth_dev
> > *dev);  void priv_dev_select_rx_function(struct priv *priv, struct
> > rte_eth_dev *dev);
> > +int mlx5_removed(const struct priv *priv);
> >
> >  /* mlx5_mac.c */
> >
> > diff --git a/drivers/net/mlx5/mlx5_ethdev.c
> > b/drivers/net/mlx5/mlx5_ethdev.c index c31ea4b..bf61cd6 100644
> > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > @@ -394,6 +394,8 @@ struct priv *
> >
> > ret = priv_sysfs_write(priv, name, value_str, (sizeof(value_str) - 1));
> > if (ret == -1) {
> > +   if (mlx5_removed(priv))
> > +   errno = ENODEV;
> > DEBUG("cannot write %s `%s' (%lu) to sysfs: %s",
> >   name, value_str, value, strerror(errno));
> > return -1;
> > @@ -925,13 +927,17 @@ struct priv *
> >  {
> > struct utsname utsname;
> > int ver[3];
> > +   int ret;
> >
> > if (uname(&utsname) == -1 ||
> > sscanf(utsname.release, "%d.%d.%d",
> >&ver[0], &ver[1], &ver[2]) != 3 ||
> > KERNEL_VERSION(ver[0], ver[1], ver[2]) < KERNEL_VERSION(4, 9,
> 0))
> > -   return mlx5_link_update_unlocked_gset(dev,
> wait_to_complete);
> > -   return mlx5_link_update_unlocked_gs(dev, wait_to_complete);
> > +   ret = mlx5_link_update_unlocked_gset(dev,
> wait_to_complete);
> > +   ret =  mlx5_link_update_unlocked_gs(dev, wait_to_complete);
> 
> Besides the extra space after "ret =", I think this doesn't work as intended. 
> A
> "else" statement is necessary.
> 
Will fix it, thanks!

> > +   if (ret && mlx5_removed(mlx5_get_priv(dev)))
> > +   return -ENODEV;
> > +   return ret;
> >  }
> >
> >  /**
> > @@ -978,6 +984,8 @@ struct priv *
> >  strerror(ret));
> > priv_unlock(priv);
> > assert(ret >= 0);
> > +   if (mlx5_removed(priv))
> > +   return -ENODEV;
> > return -ret;
> >  }
> >
> > @@ -1029,6 +1037,8 @@ struct priv *
> >  out:
> > priv_unlock(priv);
> > assert(ret >= 0);
> > +   if (mlx5_removed(priv))
> > +   return -ENODEV;
> > return -ret;
> >  }
> >
> > @@ -1083,6 +1093,8 @@ struct priv *
> >  out:
> > priv_unlock(priv);
> > assert(ret >= 0);
> > +   if (mlx5_removed(priv))
> > +   return -ENODEV;
> > return -ret;
> >  }
> >
> > @@ -1364,13 +1376,13 @@ struct priv *
> > if (up) {
> > err = priv_set_flags(priv, ~IFF_UP, IFF_UP);
> > if (err)
> > -   return err;
> > +   return errno == ENODEV ? -ENODEV : err;
> 
> There is a documentation issue here since the mlx5 PMD didn't get all the
> errno consistency fixes that mlx4 got, however err is documented as being -1
> in case of error, whereas priv_dev_set_link() returns a positive errno value
> instead and mlx5_set_link_down/up() should return only negative errno
> values but are documented as returning positive ones.
> 
> Anyway to keep it short: currently in mlx5, priv_*() => positive errno and the
> pub