Re: [dpdk-dev] [PATCH v4 0/5] vhost: generalize buffer vectors

2018-07-09 Thread Tiwei Bie
On Fri, Jul 06, 2018 at 09:04:44AM +0200, Maxime Coquelin wrote:
> This series is again preliminray work to ease packed ring
> layout integration.
> 
> Main changes are using vector buffres also in the dequeue
> path, and perform IOVA to HVA translation at vectors fill
> time.
> 
> I still have to run more benchmarks, but PVP benchmarks does
> not show performance changes.
> 
> Good thing is that it saves ~140 further lines.
> 
> Changes since v3:
> =
> - Fix dequeue_zero_copy last_used_idx update (Tiwei)
> - Remove "vhost: make gpa to hpa failure an error" patch (Tiwei)
> 
> Changes since v2:
> =
>  - check vec_id doesn't overflow (Tiwei)
>  - Fix perm parameters passed to fill_vec_buf (Tiwei)
>  - Remove extra space in variable assignation (Tiwei)
> 
> 
> Maxime Coquelin (5):
>   vhost: use shadow used ring in dequeue path
>   vhost: use buffer vectors in dequeue path
>   vhost: improve prefetching in dequeue path
>   vhost: prefetch first descriptor in dequeue path
>   vhost: improve prefetching in enqueue path
> 
>  lib/librte_vhost/vhost.h  |   1 +
>  lib/librte_vhost/virtio_net.c | 517 
> --
>  2 files changed, 193 insertions(+), 325 deletions(-)
> 
> -- 
> 2.14.4
> 

Applied to dpdk-next-virtio/master, thanks.


Re: [dpdk-dev] [PATCH v9 00/15] Vhost: add support to packed ring layout

2018-07-09 Thread Tiwei Bie
On Fri, Jul 06, 2018 at 09:07:07AM +0200, Maxime Coquelin wrote:
> This series is a handover from Jen's "[PATCH v4 00/20]
> implement packed virtqueues", which only implements the
> vhost side. Virtio PMD implementation will follow in a 
> next series.
> 
> The series applies on top of previous reworks I posted
> during this cycle that merges mergeable and non-mergeable
> receive paths, and refactors transmit path to re-use
> vector buffers.
> 
> I haven't run performance tests for now as the Virtio PMD
> side isn't ready.
> 
> The series has been tested with Tiwei's series implementing
> packed ring support to Kernel's virtio-net driver, and
> with Wei series implementing the Qemu side.
> 
> To test it, I have used testpmd on host side with a vhost
> vdev and a tap vdev forwarding to each other. Transferts
> of big random files have been done in both ways with
> integrity verified.
> 
> Tests have been run with Rx mrg ON/OFF and events suppression
> ON/OFF.
> 
> Tests have also been run with legacy split ring layout to
> ensure no regression have been introduced.
> 
> Changes since v8:
> =
> - Fix indents (Tiwei)
> - Rename struct vring_desc_packed to vring_packed_desc (Tiwei)
> 
> Changes since v7:
> =
> - Align structs and defines naming with Kernel header (Tiwei)
> - Fix event based notifications (Tiwei)
> - Fix Clang build issues caused by unused symbols (Tiwei)
> 
> Changes since v6:
> =
> - Various style cleanups (Tiwei, Jason)
> - Simplify event based notification (Jason)
> - Build support with future kernels (Tiwei)
> - Prevent buffer vectors overflow in map_one_desc (Tiwei)
> 
> Changes since v5:
> =
> - Remove duplicated VHOST_USER_F_PROTOCOL_FEATURES definition (Tiwei)
> - Fix vq_is_ready (Maxime)
> - Fix local used index overflow in flush_shadow_used_ring_packed (Tiwei)
> - Make desc_is_avail() a bool (Tiwei)
> - Improve desc_is_avail() logic (Tiwei)
> - Remove split rings addr NULL assignment in the right patch (Tiwei)
> - Make buffer id a uint16_t (Tiwei)
> 
> Jens Freimann (2):
>   vhost: add virtio packed virtqueue defines
>   vhost: add helpers for packed virtqueues
> 
> Maxime Coquelin (12):
>   vhost: clear shadow used table index at flush time
>   vhost: make indirect desc table copy desc type agnostic
>   vhost: clear batch copy index at copy time
>   vhost: extract split ring handling from Rx and Tx functions
>   vhost: append shadow used ring function names with split
>   vhost: add shadow used ring support for packed rings
>   vhost: create descriptor mapping function
>   vhost: add vector filling support for packed ring
>   vhost: add Rx support for packed ring
>   vhost: add Tx support for packed ring
>   vhost: add notification for packed ring
>   vhost: advertize packed ring layout support
> 
> Yuanhan Liu (1):
>   vhost: vring address setup for packed queues
> 
>  lib/librte_vhost/vhost.c  | 115 ++-
>  lib/librte_vhost/vhost.h  | 130 ++-
>  lib/librte_vhost/vhost_user.c | 127 +--
>  lib/librte_vhost/virtio_net.c | 776 
> +-
>  4 files changed, 939 insertions(+), 209 deletions(-)
> 
> -- 
> 2.14.4
> 

Applied to dpdk-next-virtio/master, thanks.


Re: [dpdk-dev] [PATCH v9 15/15] vhost: advertize packed ring layout support

2018-07-09 Thread Tiwei Bie
On Fri, Jul 06, 2018 at 09:07:22AM +0200, Maxime Coquelin wrote:
> Signed-off-by: Maxime Coquelin 
> ---
>  lib/librte_vhost/vhost.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index 760a09c0d..9b0ebb754 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -275,7 +275,8 @@ struct vring_packed_desc_event {
>   (1ULL << VIRTIO_RING_F_EVENT_IDX) | \
>   (1ULL << VIRTIO_NET_F_MTU)  | \
>   (1ULL << VIRTIO_F_IN_ORDER) | \
> - (1ULL << VIRTIO_F_IOMMU_PLATFORM))
> + (1ULL << VIRTIO_F_IOMMU_PLATFORM) | \
> + (1ULL << VIRTIO_F_RING_PACKED))
>  
>  
>  struct guest_page {
> -- 
> 2.14.4
> 

I didn't apply this patch, because the packed ring
support isn't complete, e.g. when doing live migration,
the wrap counter isn't synced. We can advertise this
feature bit when it's fully supported.



Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems

2018-07-09 Thread Shahaf Shuler
Thursday, July 5, 2018 8:50 PM, Ferruh Yigit:
> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit systems
> 
> On 7/5/2018 6:07 PM, Mordechay Haimovsky wrote:
> > Hello Ferruh,
> >   Here are my findings:
> >
> > 1.  The error you've seen is definitely a bug in mlx5dv.h from rdma-core
> >   (I'm emphasizing rdma-core since I cannot just send a fix for this 
> > file)
> >   As it didn’t take into account that an address may be a 32bit one when
> performing the 32bit shift.
> >   __m128i val  = _mm_set_epi32((uint32_t)address,
> > (uint32_t)(address >> 32), lkey, length); 2. The reason we didn’t see it in
> our setups is due to the values assigned to the GCC predefined macros
> > We are using (from RH and UBUNTU).
> > When I run the following commands in our setups:
> > alias gccmacros='gcc -dM -E -x c /dev/null'
> > gccmacros -m32 | grep -E "(MMX|SSE|AVX|XOP)"
> > I get the following results:
> > On RH setup using gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) 
> > (GCC)
> > #define __MMX__ 1
> > #define __SSE2__ 1
> > #define __SSE__ 1
> >   On Ubuntu setup using gcc version 5.4.0 20160609 (Ubuntu 5.4.0-
> 6ubuntu1~16.04.10)
> > No flags are defined.
> >Since the "offending" routine is wrapped with #ifdef __SSE3__ the
> compiler just ignores it.
> >
> > ARs:
> >   1. Open a bug for fixing mlx5dv.h in rdma-core. - Moti H.
> >   2. Provide a workaround for the problem. - Moti H.
> >   3. Verify that this is actually the issue by running the above scripts
> >In Ferruh setup and verifying  the SSE3 flag is set. - Ferruh
> > Yigit
> 
> I confirm SSE3 is set in my environment, but I think this will be true for all
> x86 because DPDK min required SIMD is SSE4.2. According wiki SSE3
> introduced in 2004.
> 
> We use -march=native in dpdk build, so:
> $ gcc -march=native -m32 -dM -E -  1 #define __SSE3__ 1

Thanks Ferruh,

I will remove the patch from the tree till this issue is resolved. I hope we 
can fix rdma-core in few days from now. 

> 
> 
> >
> > Moti H.
> >
> >> -Original Message-
> >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Mordechay
> >> Haimovsky
> >> Sent: Thursday, July 5, 2018 1:10 PM
> >> To: Ferruh Yigit ; Shahaf Shuler
> >> 
> >> Cc: Adrien Mazarguil ; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit
> >> systems
> >>
> >> Hi,
> >>  Didn’t see it in our setups (not an excuse),  Investigating 
> >>
> >> Moti
> >>
> >>> -Original Message-
> >>> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
> >>> Sent: Wednesday, July 4, 2018 4:49 PM
> >>> To: Mordechay Haimovsky ; Shahaf Shuler
> >>> 
> >>> Cc: Adrien Mazarguil ; dev@dpdk.org
> >>> Subject: Re: [dpdk-dev] [PATCH v2] net/mlx5: add support for 32bit
> >>> systems
> >>>
> >>> On 7/2/2018 12:11 PM, Moti Haimovsky wrote:
>  This patch adds support for building and running mlx5 PMD on 32bit
>  systems such as i686.
> 
>  The main issue to tackle was handling the 32bit access to the UAR
>  as quoted from the mlx5 PRM:
>  QP and CQ DoorBells require 64-bit writes. For best performance, it
>  is recommended to execute the QP/CQ DoorBell as a single 64-bit
>  write operation. For platforms that do not support 64 bit writes,
>  it is possible to issue the 64 bits DoorBells through two
>  consecutive writes, each write 32 bits, as described below:
>  * The order of writing each of the Dwords is from lower to upper
>    addresses.
>  * No other DoorBell can be rung (or even start ringing) in the midst of
>    an on-going write of a DoorBell over a given UAR page.
>  The last rule implies that in a multi-threaded environment, the
>  access to a UAR page (which can be accessible by all threads in the
>  process) must be synchronized (for example, using a semaphore)
>  unless an atomic write of 64 bits in a single bus operation is
>  guaranteed. Such a synchronization is not required for when ringing
>  DoorBells on different UAR pages.
> 
>  Signed-off-by: Moti Haimovsky 
>  ---
>  v2:
>  * Fixed coding style issues.
>  * Modified documentation according to review inputs.
>  * Fixed merge conflicts.
>  ---
>   doc/guides/nics/features/mlx5.ini |  1 +
>   doc/guides/nics/mlx5.rst  |  6 +++-
>   drivers/net/mlx5/mlx5.c   |  8 -
>   drivers/net/mlx5/mlx5.h   |  5 +++
>   drivers/net/mlx5/mlx5_defs.h  | 18 --
>   drivers/net/mlx5/mlx5_rxq.c   |  6 +++-
>   drivers/net/mlx5/mlx5_rxtx.c  | 22 +++--
>   drivers/net/mlx5/mlx5_rxtx.h  | 69
> >>> ++-
>   drivers/net/mlx5/mlx5_txq.c   | 13 +++-
>   9 files changed, 131 insertions(+), 17 deletions(-)
> 
>  diff --git a/doc/guides/nics/features/mlx5.ini
>  b/doc/guides/nics/features/mlx5.ini
>  index 

Re: [dpdk-dev] [PATCH] cryptodev: rename experimental private data APIs

2018-07-09 Thread Gujjar, Abhinandan S
Adding Jerin & Akhil into the loop.

Since these APIs are experimental, does the changes require announcement?

Regards
Abhinandan

> -Original Message-
> From: Trahe, Fiona
> Sent: Friday, July 6, 2018 7:10 PM
> To: dev@dpdk.org
> Cc: De Lara Guarch, Pablo ; Trahe, Fiona
> ; Gujjar, Abhinandan S 
> Subject: [PATCH] cryptodev: rename experimental private data APIs
> 
> The name private_data is confusing in these APIs:
> rte_cryptodev_sym_session_set_private_data()
> rte_cryptodev_sym_session_get_private_data()
> It refers to data added at the end of the session hdr for use by the 
> application.
> The session already contains sess_private_data[index] which is used to store
> private pmd data and most references to private data refer to that.
> e.g. external apis
> rte_cryptodev_sym_get_private_session_size() and internal
> set/get_session_private_data() refer to sess_private_data[].
> 
> So rename to user_data, i.e.
> rte_cryptodev_sym_session_set_user_data()
> rte_cryptodev_sym_session_get_user_data()
> 
> Refers to changes introduced here:
> https://patches.dpdk.org/patch/38172/
> 
> Signed-off-by: Fiona Trahe 
> ---
>  doc/guides/prog_guide/cryptodev_lib.rst| 14 +++---
>  doc/guides/prog_guide/event_crypto_adapter.rst |  6 +++---
>  doc/guides/rel_notes/release_18_08.rst |  8 
>  lib/librte_cryptodev/rte_cryptodev.c   | 16 
>  lib/librte_cryptodev/rte_cryptodev.h   | 14 +++---
>  lib/librte_cryptodev/rte_cryptodev_version.map |  4 ++--
> lib/librte_eventdev/rte_event_crypto_adapter.c |  4 ++--
>  test/test/test_event_crypto_adapter.c  |  8 
>  8 files changed, 41 insertions(+), 33 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/cryptodev_lib.rst
> b/doc/guides/prog_guide/cryptodev_lib.rst
> index 30f0bcf7a..3dbf4dde6 100644
> --- a/doc/guides/prog_guide/cryptodev_lib.rst
> +++ b/doc/guides/prog_guide/cryptodev_lib.rst
> @@ -302,24 +302,24 @@ enqueue call.
>  Private data
>  
>  For session-based operations, the set and get API provides a mechanism for 
> an -
> application to store and retrieve the private data information stored along 
> with -
> the crypto session.
> +application to store and retrieve the private user data information
> +stored along with the crypto session.
> 
>  For example, suppose an application is submitting a crypto operation with a
> session -associated and wants to indicate private data information which is
> required to be
> +associated and wants to indicate private user data information which is
> +required to be
>  used after completion of the crypto operation. In this case, the application 
> can
> use -the set API to set the private data and retrieve it using get API.
> +the set API to set the user data and retrieve it using get API.
> 
>  .. code-block:: c
> 
> - int rte_cryptodev_sym_session_set_private_data(
> + int rte_cryptodev_sym_session_set_user_data(
>   struct rte_cryptodev_sym_session *sess, void *data, uint16_t
> size);
> 
> - void * rte_cryptodev_sym_session_get_private_data(
> + void * rte_cryptodev_sym_session_get_user_data(
>   struct rte_cryptodev_sym_session *sess);
> 
> 
> -For session-less mode, the private data information can be placed along with
> the
> +For session-less mode, the private user data information can be placed
> +along with the
>  ``struct rte_crypto_op``. The ``rte_crypto_op::private_data_offset`` 
> indicates
> the  start of private data information. The offset is counted from the start 
> of the
> rte_crypto_op including other crypto information such as the IVs (since there
> can diff --git a/doc/guides/prog_guide/event_crypto_adapter.rst
> b/doc/guides/prog_guide/event_crypto_adapter.rst
> index 5c1354dec..9fe09c805 100644
> --- a/doc/guides/prog_guide/event_crypto_adapter.rst
> +++ b/doc/guides/prog_guide/event_crypto_adapter.rst
> @@ -223,9 +223,9 @@ crypto security session or at an offset in the ``struct
> rte_crypto_op``.
>  The ``rte_crypto_op::private_data_offset`` is used to locate the request/
> response in the ``rte_crypto_op``.
> 
> -For crypto session, ``rte_cryptodev_sym_session_set_private_data()`` API
> +For crypto session, ``rte_cryptodev_sym_session_set_user_data()`` API
>  will be used to set request/response data. The same data will be obtained -by
> ``rte_cryptodev_sym_session_get_private_data()`` API.  The
> +by ``rte_cryptodev_sym_session_get_user_data()`` API.  The
>  RTE_EVENT_CRYPTO_ADAPTER_CAP_SESSION_PRIVATE_DATA capability
> indicates  whether HW or SW supports this feature.
> 
> @@ -257,7 +257,7 @@ the ``rte_crypto_op``.
>  m_data.request_info.cdev_id = cdev_id;
>  m_data.request_info.queue_pair_id = qp_id;
>  /* Call set API to store private data information */
> -rte_cryptodev_sym_session_set_private_data(
> +rte_cryptodev_sym_session_set_user_data(
>   

Re: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect in ixgbe

2018-07-09 Thread Lu, Wenzhuo
Hi Jeff,

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jeff Guo
> Sent: Monday, July 9, 2018 2:57 PM
> To: step...@networkplumber.org; Richardson, Bruce
> ; Yigit, Ferruh ;
> Ananyev, Konstantin ;
> gaetan.ri...@6wind.com; Wu, Jingjing ;
> tho...@monjalon.net; mo...@mellanox.com; ma...@mellanox.com; Van
> Haaren, Harry ; Zhang, Qi Z
> ; He, Shaopeng ;
> Iremonger, Bernard ;
> arybche...@solarflare.com
> Cc: jblu...@infradead.org; shreyansh.j...@nxp.com; dev@dpdk.org; Guo,
> Jia ; Zhang, Helin 
> Subject: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect in ixgbe
> 
> This patch aim to enable hotplug detect in ixgbe pmd driver. Firstly it set 
> the
> flags RTE_PCI_DRV_INTR_RMV in drv_flags to announce the hotplug ability,
> and then use rte_dev_event_callback_register to register the hotplug event
> callback to eal. When eal detect the hotplug event, it will call the callback 
> to
> process it, if the event is hotplug remove, it will trigger the
> RTE_ETH_EVENT_INTR_RMV event into ethdev callback to let app process
> the hotplug for the ethdev.
> 
> This is an example for other driver, that if any driver support hotplug 
> feature
> could be use this way to enable hotplug detect.
> 
> Signed-off-by: Jeff Guo 
> ---
> v2->v1:
> refine some doc.
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c | 46
> +++-
>  1 file changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> b/drivers/net/ixgbe/ixgbe_ethdev.c
> index 87d2ad0..83ce026 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -1534,6 +1534,47 @@ generate_random_mac_addr(struct ether_addr
> *mac_addr)
>   memcpy(&mac_addr->addr_bytes[3], &random, 3);  }
> 
> +static void
> +eth_dev_event_callback(char *device_name, enum rte_dev_event_type
> type,
> +__rte_unused void *arg)
> +{
> + uint32_t pid;
> +
> + if (type >= RTE_DEV_EVENT_MAX) {
> + fprintf(stderr, "%s called upon invalid event %d\n",
> + __func__, type);
> + fflush(stderr);
> + }
> +
> + switch (type) {
> + case RTE_DEV_EVENT_REMOVE:
> + PMD_DRV_LOG(INFO, "The device: %s has been removed!\n",
> + device_name);
> +
> + if (!device_name)
> + return;
> +
> + for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
> + if (rte_eth_devices[pid].device) {
> + if (!strcmp(device_name,
> + rte_eth_devices[pid].device->name)) {
> + _rte_eth_dev_callback_process(
> + &rte_eth_devices[pid],
> + RTE_ETH_EVENT_INTR_RMV,
> NULL);
> + continue;
> + }
> + }
> + }
> + break;
> + case RTE_DEV_EVENT_ADD:
> + RTE_LOG(INFO, EAL, "The device: %s has been added!\n",
> + device_name);
> + break;
> + default:
> + break;
> + }
> +}
I don't get the point. Looks like this's a very common rte code. Why is it put 
in ixgbe pmd?



Re: [dpdk-dev] [PATCH v2 3/3] testpmd: remove the dev event callback register

2018-07-09 Thread Lu, Wenzhuo
Hi,


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jeff Guo
> Sent: Monday, July 9, 2018 2:57 PM
> To: step...@networkplumber.org; Richardson, Bruce
> ; Yigit, Ferruh ;
> Ananyev, Konstantin ;
> gaetan.ri...@6wind.com; Wu, Jingjing ;
> tho...@monjalon.net; mo...@mellanox.com; ma...@mellanox.com; Van
> Haaren, Harry ; Zhang, Qi Z
> ; He, Shaopeng ;
> Iremonger, Bernard ;
> arybche...@solarflare.com
> Cc: jblu...@infradead.org; shreyansh.j...@nxp.com; dev@dpdk.org; Guo,
> Jia ; Zhang, Helin 
> Subject: [dpdk-dev] [PATCH v2 3/3] testpmd: remove the dev event callback
> register
> 
> Since now we can use driver to management the eal event for hotplug, so no
> need to register dev event callback in app anymore. This patch remove the
> related code.
> 
> Signed-off-by: Jeff Guo 
Acked-by: Wenzhuo Lu 


Re: [dpdk-dev] [PATCH v6 6/7] eal: add failure handle mechanism for hotplug

2018-07-09 Thread Gaëtan Rivet
Hi Jeff,

On Mon, Jul 09, 2018 at 02:51:21PM +0800, Jeff Guo wrote:
> This patch introduces a failure handle mechanism to handle device
> hotplug removal event.
> 
> First it can register sigbus handler when enable device event monitor. Once
> sigbus error be captured, it will check the failure address and accordingly
> remap the invalid memory for the corresponding device. Besed on this
> mechanism, it could guaranty the application not crash when the device be
> hotplug out.
> 
> Signed-off-by: Jeff Guo 
> Acked-by: Shaopeng He 
> ---
> v6->v5:
> refine some doc and coding style
> ---
>  lib/librte_eal/linuxapp/eal/eal_dev.c | 114 
> +-
>  1 file changed, 113 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
> b/lib/librte_eal/linuxapp/eal/eal_dev.c
> index 1cf6aeb..cb30729 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_dev.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
> @@ -4,6 +4,8 @@
>  
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  
> @@ -14,15 +16,31 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
> +#include 
>  
>  #include "eal_private.h"
>  
>  static struct rte_intr_handle intr_handle = {.fd = -1 };
>  static bool monitor_started;
>  
> +extern struct rte_bus_list rte_bus_list;
> +

Where do you use the rte_bus_list? It seems the reference is a remnant
from a previous version.

You do not seem to need a direct access on rte_bus_list,
as you call rte_bus_find instead.

Why do you need this extern? I think its absence is motivated: to keep the
bus list private and force users to access it through standard exposed ways.

Regards,
-- 
Gaëtan Rivet
6WIND


Re: [dpdk-dev] [PATCH v2 2/3] net/i40e: enable hotplug detect in i40e

2018-07-09 Thread Matan Azrad
Hi Guo

From: Jeff Guo
> This patch aim to enable hotplug detect in i40e pmd driver. Firstly it set the
> flags RTE_PCI_DRV_INTR_RMV in drv_flags to announce the hotplug ability,
> and then use rte_dev_event_callback_register to register the hotplug event
> callback to eal. When eal detect the hotplug event, it will call the callback 
> to
> process it, if the event is hotplug remove, it will trigger the
> RTE_ETH_EVENT_INTR_RMV event into ethdev callback to let app process
> the hotplug for the ethdev.
> 
> This is an example for other driver, that if any driver support hotplug 
> feature
> could be use this way to enable hotplug detect.
> 
> Signed-off-by: Jeff Guo 
> ---
> v2->v1:
> no v1, add hotplug detect in ixgbe for new.
> ---
>  drivers/net/i40e/i40e_ethdev.c | 46
> +-
>  1 file changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> index 13c5d32..ad4231f 100644
> --- a/drivers/net/i40e/i40e_ethdev.c
> +++ b/drivers/net/i40e/i40e_ethdev.c
> @@ -688,7 +688,7 @@ static int eth_i40e_pci_remove(struct rte_pci_device
> *pci_dev)  static struct rte_pci_driver rte_i40e_pmd = {
>   .id_table = pci_id_i40e_map,
>   .drv_flags = RTE_PCI_DRV_NEED_MAPPING |
> RTE_PCI_DRV_INTR_LSC |
> -  RTE_PCI_DRV_IOVA_AS_VA,
> +  RTE_PCI_DRV_IOVA_AS_VA | RTE_PCI_DRV_INTR_RMV,
>   .probe = eth_i40e_pci_probe,
>   .remove = eth_i40e_pci_remove,
>  };
> @@ -1183,6 +1183,47 @@ i40e_aq_debug_write_global_register(struct
> i40e_hw *hw,
>   return i40e_aq_debug_write_register(hw, reg_addr, reg_val,
> cmd_details);  }
> 
> +static void
> +eth_dev_event_callback(char *device_name, enum rte_dev_event_type
> type,
> +__rte_unused void *arg)
> +{
> + uint32_t pid;
> +
> + if (type >= RTE_DEV_EVENT_MAX) {
> + fprintf(stderr, "%s called upon invalid event %d\n",
> + __func__, type);
> + fflush(stderr);
> + }
> +
> + switch (type) {
> + case RTE_DEV_EVENT_REMOVE:
> + PMD_DRV_LOG(INFO, "The device: %s has been
> removed!\n",
> + device_name);
> +
> + if (!device_name)
> + return;
> +
> + for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
> + if (rte_eth_devices[pid].device) {
> + if (!strcmp(device_name,
> + rte_eth_devices[pid].device->name)) {

You just need to compare this PMD ethdev ports device names to the current EAL 
removed device name.
You should not raise RMV events for other PMD  ports.

> + _rte_eth_dev_callback_process(
> + &rte_eth_devices[pid],
> + RTE_ETH_EVENT_INTR_RMV,
> NULL);
> + continue;
> + }
> + }
> + }
> + break;
> + case RTE_DEV_EVENT_ADD:
> + RTE_LOG(INFO, EAL, "The device: %s has been added!\n",
> + device_name);
> + break;
> + default:
> + break;
> + }
> +}
> +
>  static int
>  eth_i40e_dev_init(struct rte_eth_dev *dev, void *init_params
> __rte_unused)  { @@ -1442,6 +1483,9 @@ eth_i40e_dev_init(struct
> rte_eth_dev *dev, void *init_params __rte_unused)
>   rte_intr_callback_register(intr_handle,
>  i40e_dev_interrupt_handler, dev);
> 
> + /* register the device event callback */
> + rte_dev_event_callback_register(NULL, eth_dev_event_callback,
> NULL);
> +
>   /* configure and enable device interrupt */
>   i40e_pf_config_irq0(hw, TRUE);
>   i40e_pf_enable_irq0(hw);
> --
> 2.7.4



Re: [dpdk-dev] [PATCH] cryptodev: rename experimental private data APIs

2018-07-09 Thread De Lara Guarch, Pablo
Hi Abhinandan,

> -Original Message-
> From: Gujjar, Abhinandan S
> Sent: Monday, July 9, 2018 8:34 AM
> To: Trahe, Fiona ; dev@dpdk.org;
> jerin.ja...@caviumnetworks.com; Akhil Goyal 
> Cc: De Lara Guarch, Pablo 
> Subject: RE: [PATCH] cryptodev: rename experimental private data APIs
> 
> Adding Jerin & Akhil into the loop.
> 
> Since these APIs are experimental, does the changes require announcement?

No, they don't. Just a note in the API changes section in Release Notes is 
recommended.

Pablo

> 
> Regards
> Abhinandan
> 



Re: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect in ixgbe

2018-07-09 Thread Matan Azrad
Hi

From: Lu, Wenzhuo
> Hi Jeff,
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jeff Guo
> > Sent: Monday, July 9, 2018 2:57 PM
> > To: step...@networkplumber.org; Richardson, Bruce
> > ; Yigit, Ferruh ;
> > Ananyev, Konstantin ;
> > gaetan.ri...@6wind.com; Wu, Jingjing ;
> > tho...@monjalon.net; mo...@mellanox.com; ma...@mellanox.com;
> Van
> > Haaren, Harry ; Zhang, Qi Z
> > ; He, Shaopeng ;
> > Iremonger, Bernard ;
> > arybche...@solarflare.com
> > Cc: jblu...@infradead.org; shreyansh.j...@nxp.com; dev@dpdk.org; Guo,
> > Jia ; Zhang, Helin 
> > Subject: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect in
> > ixgbe
> >
> > This patch aim to enable hotplug detect in ixgbe pmd driver. Firstly
> > it set the flags RTE_PCI_DRV_INTR_RMV in drv_flags to announce the
> > hotplug ability, and then use rte_dev_event_callback_register to
> > register the hotplug event callback to eal. When eal detect the
> > hotplug event, it will call the callback to process it, if the event
> > is hotplug remove, it will trigger the RTE_ETH_EVENT_INTR_RMV event
> > into ethdev callback to let app process the hotplug for the ethdev.
> >
> > This is an example for other driver, that if any driver support
> > hotplug feature could be use this way to enable hotplug detect.
> >
> > Signed-off-by: Jeff Guo 
> > ---
> > v2->v1:
> > refine some doc.
> > ---
> >  drivers/net/ixgbe/ixgbe_ethdev.c | 46
> > +++-
> >  1 file changed, 45 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > b/drivers/net/ixgbe/ixgbe_ethdev.c
> > index 87d2ad0..83ce026 100644
> > --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > @@ -1534,6 +1534,47 @@ generate_random_mac_addr(struct ether_addr
> > *mac_addr)
> > memcpy(&mac_addr->addr_bytes[3], &random, 3);  }
> >
> > +static void
> > +eth_dev_event_callback(char *device_name, enum rte_dev_event_type
> > type,
> > +  __rte_unused void *arg)
> > +{
> > +   uint32_t pid;
> > +
> > +   if (type >= RTE_DEV_EVENT_MAX) {
> > +   fprintf(stderr, "%s called upon invalid event %d\n",
> > +   __func__, type);
> > +   fflush(stderr);
> > +   }
> > +
> > +   switch (type) {
> > +   case RTE_DEV_EVENT_REMOVE:
> > +   PMD_DRV_LOG(INFO, "The device: %s has been
> removed!\n",
> > +   device_name);
> > +
> > +   if (!device_name)
> > +   return;
> > +
> > +   for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
> > +   if (rte_eth_devices[pid].device) {
> > +   if (!strcmp(device_name,
> > +   rte_eth_devices[pid].device->name)) {
> > +   _rte_eth_dev_callback_process(
> > +   &rte_eth_devices[pid],
> > +   RTE_ETH_EVENT_INTR_RMV,
> > NULL);
> > +   continue;
> > +   }
> > +   }
> > +   }
> > +   break;
> > +   case RTE_DEV_EVENT_ADD:
> > +   RTE_LOG(INFO, EAL, "The device: %s has been added!\n",
> > +   device_name);
> > +   break;
> > +   default:
> > +   break;
> > +   }
> > +}
> I don't get the point. Looks like this's a very common rte code. Why is it 
> put in
> ixgbe pmd?

Jeff needs to detect if the removed device is related to this PMD, than to 
raise RMV events for all this PMD ethdev associated ports. 
He should not raise RMV events for other PMD  ports.





Re: [dpdk-dev] [PATCH v5 3/4] compressdev: replace mbuf scatter gather flag

2018-07-09 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Verma, Shally [mailto:shally.ve...@cavium.com]
> Sent: Saturday, July 7, 2018 7:34 AM
> To: De Lara Guarch, Pablo ; Gupta, Ashish
> ; Trahe, Fiona ; Daly, Lee
> 
> Cc: dev@dpdk.org
> Subject: RE: [PATCH v5 3/4] compressdev: replace mbuf scatter gather flag
> 
> 
> 
> >-Original Message-
> >From: Pablo de Lara [mailto:pablo.de.lara.gua...@intel.com]
> >Sent: 06 July 2018 10:58
> >To: Verma, Shally ; Gupta, Ashish
> >; fiona.tr...@intel.com; lee.d...@intel.com
> >Cc: dev@dpdk.org; Pablo de Lara 
> >Subject: [PATCH v5 3/4] compressdev: replace mbuf scatter gather flag
> >
> >External Email
> >
> >The current mbuf scatter gather feature flag is too ambiguous, as it is
> >not clear if input and/or output buffers can be scatter gather mbufs or
> >not.
> >
> >Therefore, three new flags will replace this flag:
> >- RTE_COMP_FF_OOP_SGL_IN_SGL_OUT
> >- RTE_COMP_FF_OOP_SGL_IN_FB_OUT
> >- RTE_COMP_FF_OOP_LB_IN_SGL_OUT
> >
> >Note that out-of-place flat buffers is supported by default and
> >in-place is not supported by the library.
> >
> >Signed-off-by: Pablo de Lara 
> >Acked-by: Fiona Trahe 
> >---
> >
> >v5:
> >- Replaced left "Flat Buffer" with "Linear Buffer" (Shally)
> >- Rephrased comment about new feature flags (Shally)
> >
> >v4:
> >- Replaced FB (Flat Buffers) with LB (Linear Buffers) (Shally)
> >- Add extra explanation on comments about Linear Buffers vs
> >  Scatter-gather lists
> >
> >v3:
> >- Replaced Out-of-place with OOP
> >- Added new feature flags in default.ini
> >
> >v2:
> >- Fixed typos
> >- Rephrased comments
> >
> > doc/guides/compressdevs/features/default.ini | 34 +++--
> ---
> > doc/guides/compressdevs/overview.rst | 14 
> > doc/guides/rel_notes/release_18_08.rst   |  6 +
> > lib/librte_compressdev/rte_comp.c|  8 +--
> > lib/librte_compressdev/rte_comp.h| 31 +
> > 5 files changed, 65 insertions(+), 28 deletions(-)
> >
> ...
> 
> >diff --git a/doc/guides/compressdevs/overview.rst
> >b/doc/guides/compressdevs/overview.rst
> >index d01c1a966..70bbe82b7 100644
> >--- a/doc/guides/compressdevs/overview.rst
> >+++ b/doc/guides/compressdevs/overview.rst
> >@@ -16,3 +16,17 @@ Supported Feature Flags
> >- "Pass-through" feature flag refers to the ability of the PMD
> >  to let input buffers pass-through it, copying the input to the output,
> >  without making any modifications to it (no compression done).
> >+
> >+   - "OOP SGL In SGL Out" feature flag stands for
> >+ "Out-of-place Scatter-gather list Input, Scatter-gater list Output",
> >+ which means PMD supports different scatter-gather styled input and 
> >output
> buffers
> >+ (i.e. both can consists of multiple segments).
> >+
> >+   - "OOP SGL In LB Out" feature flag stands for
> >+ "Out-of-place Scatter-gather list Input, Linear Buffers Output",
> >+ which means PMD supports input from scatter-gathered styled buffers,
> outputting linear buffers
> >+ (i.e. single segment).
> >+
> >+   - "OOP LB In SGL Out" feature flag stands for
> >+ "Out-of-place Linear Buffers Input, Scatter-gather list Output",
> >+ which means PMD supports input from linear buffer, outputting scatter-
> gathered styled buffers.
> 
> 
> 
> >diff --git a/lib/librte_compressdev/rte_comp.h
> >b/lib/librte_compressdev/rte_comp.h
> >index 5b513c77e..274b5eadf 100644
> >--- a/lib/librte_compressdev/rte_comp.h
> >+++ b/lib/librte_compressdev/rte_comp.h
> >@@ -30,23 +30,34 @@ extern "C" {
> > /**< Stateful compression is supported */
> > #define RTE_COMP_FF_STATEFUL_DECOMPRESSION (1ULL << 1)
> > /**< Stateful decompression is supported */
> >-#defineRTE_COMP_FF_MBUF_SCATTER_GATHER (1ULL << 2)
> >-/**< Scatter-gather mbufs are supported */
> >-#define RTE_COMP_FF_ADLER32_CHECKSUM   (1ULL << 3)
> >+#define RTE_COMP_FF_OOP_SGL_IN_SGL_OUT (1ULL << 2)
> >+/**< Out-of-place Scatter-gather (SGL) buffers,
> >+ * with multiple segments, are supported in input and output  */
> >+#define RTE_COMP_FF_OOP_SGL_IN_LB_OUT  (1ULL << 3)
> >+/**< Out-of-place Scatter-gather (SGL) buffers are supported
> >+ * in input, combined with linear buffers (LB), with a
> >+ * single segment, in output
> >+ */
> >+#define RTE_COMP_FF_OOP_LB_IN_SGL_OUT  (1ULL << 4)
> >+/**< Out-of-place Scatter-gather (SGL) mbufs are supported
> >+ * in output, combined with linear buffers (LB) in input  */
> [Shally] Similar rephrase here please.

Ok, I am replacing "mbufs" with "buffers". I won't clarify what SGL and LB 
means,
as it is done in the previous macros.

I will make this change on the fly as I am applying this patch, as it is the 
last comment,
I believe.

Thanks,
Pablo

> 
> Rest,
> Acked-by: Shally Verma 



Re: [dpdk-dev] [PATCH v6 6/7] eal: add failure handle mechanism for hotplug

2018-07-09 Thread Jeff Guo

hi, gaetan


On 7/9/2018 3:42 PM, Gaëtan Rivet wrote:

Hi Jeff,

On Mon, Jul 09, 2018 at 02:51:21PM +0800, Jeff Guo wrote:

This patch introduces a failure handle mechanism to handle device
hotplug removal event.

First it can register sigbus handler when enable device event monitor. Once
sigbus error be captured, it will check the failure address and accordingly
remap the invalid memory for the corresponding device. Besed on this
mechanism, it could guaranty the application not crash when the device be
hotplug out.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v6->v5:
refine some doc and coding style
---
  lib/librte_eal/linuxapp/eal/eal_dev.c | 114 +-
  1 file changed, 113 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..cb30729 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
  
  #include 

  #include 
+#include 
+#include 
  #include 
  #include 
  
@@ -14,15 +16,31 @@

  #include 
  #include 
  #include 
+#include 
+#include 
+#include 
+#include 
  
  #include "eal_private.h"
  
  static struct rte_intr_handle intr_handle = {.fd = -1 };

  static bool monitor_started;
  
+extern struct rte_bus_list rte_bus_list;

+

Where do you use the rte_bus_list? It seems the reference is a remnant
from a previous version.

You do not seem to need a direct access on rte_bus_list,
as you call rte_bus_find instead.

Why do you need this extern? I think its absence is motivated: to keep the
bus list private and force users to access it through standard exposed ways.

Regards,


i think that is my missing here. Will delete it. Thanks for your info.




Re: [dpdk-dev] [PATCH v5 1/4] doc: cleanup ISA-L PMD feature matrix

2018-07-09 Thread De Lara Guarch, Pablo



> -Original Message-
> From: De Lara Guarch, Pablo
> Sent: Friday, July 6, 2018 6:28 AM
> To: shally.ve...@caviumnetworks.com; ashish.gu...@caviumnetworks.com;
> Trahe, Fiona ; Daly, Lee 
> Cc: dev@dpdk.org; De Lara Guarch, Pablo 
> Subject: [PATCH v5 1/4] doc: cleanup ISA-L PMD feature matrix
> 
> In PMD feature matrices (.ini files), it is not required to have the list of 
> features
> that are not supported, just the ones that are.
> 
> Signed-off-by: Pablo de Lara 
> Acked-by: Lee Daly 
> ---

Patchset applied to dpdk-next-crypto, with Shally's last comment addressed.

Thanks,
Pablo


Re: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect in ixgbe

2018-07-09 Thread Andrew Rybchenko

On 09.07.2018 09:56, Jeff Guo wrote:

This patch aim to enable hotplug detect in ixgbe pmd driver. Firstly it
set the flags RTE_PCI_DRV_INTR_RMV in drv_flags to announce the hotplug
ability, and then use rte_dev_event_callback_register to register
the hotplug event callback to eal. When eal detect the hotplug event,
it will call the callback to process it, if the event is hotplug remove,
it will trigger the RTE_ETH_EVENT_INTR_RMV event into ethdev callback
to let app process the hotplug for the ethdev.

This is an example for other driver, that if any driver support hotplug
feature could be use this way to enable hotplug detect.


I see nothing ixgbe specific in the callback. Yes, support of removal
event should be in drv_flags, but it looks like the callback may be
generic and located in ethdev.

Also search of the device by name could be done using querying
mechanism to be added by Gaetan [1].

[1] https://patches.dpdk.org/project/dpdk/list/?series=419


Re: [dpdk-dev] [PATCH v2 3/3] testpmd: remove the dev event callback register

2018-07-09 Thread Andrew Rybchenko

On 09.07.2018 09:56, Jeff Guo wrote:

Since now we can use driver to management the eal event for hotplug,
so no need to register dev event callback in app anymore. This patch
remove the related code.


I don't understand why handling on device level means removal
of the application callback. May be as a cleanup.
I guess application still could be interested in device addition and
removal events. It is mainly question to testpmd maintainer.



Re: [dpdk-dev] [PATCH v2 3/3] testpmd: remove the dev event callback register

2018-07-09 Thread Jeff Guo




On 7/9/2018 4:16 PM, Andrew Rybchenko wrote:

On 09.07.2018 09:56, Jeff Guo wrote:

Since now we can use driver to management the eal event for hotplug,
so no need to register dev event callback in app anymore. This patch
remove the related code.


I don't understand why handling on device level means removal
of the application callback. May be as a cleanup.
I guess application still could be interested in device addition and
removal events. It is mainly question to testpmd maintainer.



I think the callback could be used by anyone who interesting it. you are 
right, but It is optional, who use it will surely in charge of the event 
and callback management.
Here remove it, just for select an other choice and no select the 
previous way to show hotplug example.

just select 1 from 2, no need to let 2 combined.



Re: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect in ixgbe

2018-07-09 Thread Jeff Guo




On 7/9/2018 4:13 PM, Andrew Rybchenko wrote:

On 09.07.2018 09:56, Jeff Guo wrote:

This patch aim to enable hotplug detect in ixgbe pmd driver. Firstly it
set the flags RTE_PCI_DRV_INTR_RMV in drv_flags to announce the hotplug
ability, and then use rte_dev_event_callback_register to register
the hotplug event callback to eal. When eal detect the hotplug event,
it will call the callback to process it, if the event is hotplug remove,
it will trigger the RTE_ETH_EVENT_INTR_RMV event into ethdev callback
to let app process the hotplug for the ethdev.

This is an example for other driver, that if any driver support hotplug
feature could be use this way to enable hotplug detect.


I see nothing ixgbe specific in the callback. Yes, support of removal
event should be in drv_flags, but it looks like the callback may be
generic and located in ethdev.



Let it be generic and located in ethdev should be a good idea.


Also search of the device by name could be done using querying
mechanism to be added by Gaetan [1].

[1] https://patches.dpdk.org/project/dpdk/list/?series=419


here, i just want to check if the eth port is belong to the removal device.



Re: [dpdk-dev] [PATCH v2 2/3] net/i40e: enable hotplug detect in i40e

2018-07-09 Thread Jeff Guo

hi, matan


On 7/9/2018 3:47 PM, Matan Azrad wrote:

Hi Guo

From: Jeff Guo

This patch aim to enable hotplug detect in i40e pmd driver. Firstly it set the
flags RTE_PCI_DRV_INTR_RMV in drv_flags to announce the hotplug ability,
and then use rte_dev_event_callback_register to register the hotplug event
callback to eal. When eal detect the hotplug event, it will call the callback to
process it, if the event is hotplug remove, it will trigger the
RTE_ETH_EVENT_INTR_RMV event into ethdev callback to let app process
the hotplug for the ethdev.

This is an example for other driver, that if any driver support hotplug feature
could be use this way to enable hotplug detect.

Signed-off-by: Jeff Guo 
---
v2->v1:
no v1, add hotplug detect in ixgbe for new.
---
  drivers/net/i40e/i40e_ethdev.c | 46
+-
  1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 13c5d32..ad4231f 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -688,7 +688,7 @@ static int eth_i40e_pci_remove(struct rte_pci_device
*pci_dev)  static struct rte_pci_driver rte_i40e_pmd = {
.id_table = pci_id_i40e_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING |
RTE_PCI_DRV_INTR_LSC |
-RTE_PCI_DRV_IOVA_AS_VA,
+RTE_PCI_DRV_IOVA_AS_VA | RTE_PCI_DRV_INTR_RMV,
.probe = eth_i40e_pci_probe,
.remove = eth_i40e_pci_remove,
  };
@@ -1183,6 +1183,47 @@ i40e_aq_debug_write_global_register(struct
i40e_hw *hw,
return i40e_aq_debug_write_register(hw, reg_addr, reg_val,
cmd_details);  }

+static void
+eth_dev_event_callback(char *device_name, enum rte_dev_event_type
type,
+  __rte_unused void *arg)
+{
+   uint32_t pid;
+
+   if (type >= RTE_DEV_EVENT_MAX) {
+   fprintf(stderr, "%s called upon invalid event %d\n",
+   __func__, type);
+   fflush(stderr);
+   }
+
+   switch (type) {
+   case RTE_DEV_EVENT_REMOVE:
+   PMD_DRV_LOG(INFO, "The device: %s has been
removed!\n",
+   device_name);
+
+   if (!device_name)
+   return;
+
+   for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
+   if (rte_eth_devices[pid].device) {
+   if (!strcmp(device_name,
+   rte_eth_devices[pid].device->name)) {

You just need to compare this PMD ethdev ports device names to the current EAL 
removed device name.
You should not raise RMV events for other PMD  ports.


make sense here. thanks matan.


+   _rte_eth_dev_callback_process(
+   &rte_eth_devices[pid],
+   RTE_ETH_EVENT_INTR_RMV,
NULL);
+   continue;
+   }
+   }
+   }
+   break;
+   case RTE_DEV_EVENT_ADD:
+   RTE_LOG(INFO, EAL, "The device: %s has been added!\n",
+   device_name);
+   break;
+   default:
+   break;
+   }
+}
+
  static int
  eth_i40e_dev_init(struct rte_eth_dev *dev, void *init_params
__rte_unused)  { @@ -1442,6 +1483,9 @@ eth_i40e_dev_init(struct
rte_eth_dev *dev, void *init_params __rte_unused)
rte_intr_callback_register(intr_handle,
   i40e_dev_interrupt_handler, dev);

+   /* register the device event callback */
+   rte_dev_event_callback_register(NULL, eth_dev_event_callback,
NULL);
+
/* configure and enable device interrupt */
i40e_pf_config_irq0(hw, TRUE);
i40e_pf_enable_irq0(hw);
--
2.7.4




Re: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect in ixgbe

2018-07-09 Thread Jeff Guo

hi, wenzhuo and matan.


On 7/9/2018 3:51 PM, Matan Azrad wrote:

Hi

From: Lu, Wenzhuo

Hi Jeff,


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jeff Guo
Sent: Monday, July 9, 2018 2:57 PM
To: step...@networkplumber.org; Richardson, Bruce
; Yigit, Ferruh ;
Ananyev, Konstantin ;
gaetan.ri...@6wind.com; Wu, Jingjing ;
tho...@monjalon.net; mo...@mellanox.com; ma...@mellanox.com;

Van

Haaren, Harry ; Zhang, Qi Z
; He, Shaopeng ;
Iremonger, Bernard ;
arybche...@solarflare.com
Cc: jblu...@infradead.org; shreyansh.j...@nxp.com; dev@dpdk.org; Guo,
Jia ; Zhang, Helin 
Subject: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect in
ixgbe

This patch aim to enable hotplug detect in ixgbe pmd driver. Firstly
it set the flags RTE_PCI_DRV_INTR_RMV in drv_flags to announce the
hotplug ability, and then use rte_dev_event_callback_register to
register the hotplug event callback to eal. When eal detect the
hotplug event, it will call the callback to process it, if the event
is hotplug remove, it will trigger the RTE_ETH_EVENT_INTR_RMV event
into ethdev callback to let app process the hotplug for the ethdev.

This is an example for other driver, that if any driver support
hotplug feature could be use this way to enable hotplug detect.

Signed-off-by: Jeff Guo 
---
v2->v1:
refine some doc.
---
  drivers/net/ixgbe/ixgbe_ethdev.c | 46
+++-
  1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
b/drivers/net/ixgbe/ixgbe_ethdev.c
index 87d2ad0..83ce026 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1534,6 +1534,47 @@ generate_random_mac_addr(struct ether_addr
*mac_addr)
memcpy(&mac_addr->addr_bytes[3], &random, 3);  }

+static void
+eth_dev_event_callback(char *device_name, enum rte_dev_event_type
type,
+  __rte_unused void *arg)
+{
+   uint32_t pid;
+
+   if (type >= RTE_DEV_EVENT_MAX) {
+   fprintf(stderr, "%s called upon invalid event %d\n",
+   __func__, type);
+   fflush(stderr);
+   }
+
+   switch (type) {
+   case RTE_DEV_EVENT_REMOVE:
+   PMD_DRV_LOG(INFO, "The device: %s has been

removed!\n",

+   device_name);
+
+   if (!device_name)
+   return;
+
+   for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
+   if (rte_eth_devices[pid].device) {
+   if (!strcmp(device_name,
+   rte_eth_devices[pid].device->name)) {
+   _rte_eth_dev_callback_process(
+   &rte_eth_devices[pid],
+   RTE_ETH_EVENT_INTR_RMV,
NULL);
+   continue;
+   }
+   }
+   }
+   break;
+   case RTE_DEV_EVENT_ADD:
+   RTE_LOG(INFO, EAL, "The device: %s has been added!\n",
+   device_name);
+   break;
+   default:
+   break;
+   }
+}

I don't get the point. Looks like this's a very common rte code. Why is it put 
in
ixgbe pmd?

Jeff needs to detect if the removed device is related to this PMD, than to 
raise RMV events for all this PMD ethdev associated ports.
He should not raise RMV events for other PMD  ports.



It should be like wenzhuo said that i could no strong reason to let 
common way in ixgbe pmd.  And sure raise RMV events for none related PMD 
ports is not my hope.

Will plan to let it go into the eth dev layer to process it.







Re: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect in ixgbe

2018-07-09 Thread Matan Azrad
Hi

From: Jeff Guo 
> hi, wenzhuo and matan.
> 
> 
> On 7/9/2018 3:51 PM, Matan Azrad wrote:
> > Hi
> >
> > From: Lu, Wenzhuo
> >> Hi Jeff,
> >>
> >>> -Original Message-
> >>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jeff Guo
> >>> Sent: Monday, July 9, 2018 2:57 PM
> >>> To: step...@networkplumber.org; Richardson, Bruce
> >>> ; Yigit, Ferruh
> >>> ; Ananyev, Konstantin
> >>> ; gaetan.ri...@6wind.com; Wu,
> Jingjing
> >>> ; tho...@monjalon.net;
> mo...@mellanox.com;
> >>> ma...@mellanox.com;
> >> Van
> >>> Haaren, Harry ; Zhang, Qi Z
> >>> ; He, Shaopeng ;
> >>> Iremonger, Bernard ;
> >>> arybche...@solarflare.com
> >>> Cc: jblu...@infradead.org; shreyansh.j...@nxp.com; dev@dpdk.org;
> >>> Guo, Jia ; Zhang, Helin 
> >>> Subject: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect
> >>> in ixgbe
> >>>
> >>> This patch aim to enable hotplug detect in ixgbe pmd driver. Firstly
> >>> it set the flags RTE_PCI_DRV_INTR_RMV in drv_flags to announce the
> >>> hotplug ability, and then use rte_dev_event_callback_register to
> >>> register the hotplug event callback to eal. When eal detect the
> >>> hotplug event, it will call the callback to process it, if the event
> >>> is hotplug remove, it will trigger the RTE_ETH_EVENT_INTR_RMV event
> >>> into ethdev callback to let app process the hotplug for the ethdev.
> >>>
> >>> This is an example for other driver, that if any driver support
> >>> hotplug feature could be use this way to enable hotplug detect.
> >>>
> >>> Signed-off-by: Jeff Guo 
> >>> ---
> >>> v2->v1:
> >>> refine some doc.
> >>> ---
> >>>   drivers/net/ixgbe/ixgbe_ethdev.c | 46
> >>> +++-
> >>>   1 file changed, 45 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> >>> b/drivers/net/ixgbe/ixgbe_ethdev.c
> >>> index 87d2ad0..83ce026 100644
> >>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> >>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> >>> @@ -1534,6 +1534,47 @@ generate_random_mac_addr(struct
> ether_addr
> >>> *mac_addr)
> >>>   memcpy(&mac_addr->addr_bytes[3], &random, 3);  }
> >>>
> >>> +static void
> >>> +eth_dev_event_callback(char *device_name, enum
> rte_dev_event_type
> >>> type,
> >>> +__rte_unused void *arg)
> >>> +{
> >>> + uint32_t pid;
> >>> +
> >>> + if (type >= RTE_DEV_EVENT_MAX) {
> >>> + fprintf(stderr, "%s called upon invalid event %d\n",
> >>> + __func__, type);
> >>> + fflush(stderr);
> >>> + }
> >>> +
> >>> + switch (type) {
> >>> + case RTE_DEV_EVENT_REMOVE:
> >>> + PMD_DRV_LOG(INFO, "The device: %s has been
> >> removed!\n",
> >>> + device_name);
> >>> +
> >>> + if (!device_name)
> >>> + return;
> >>> +
> >>> + for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
> >>> + if (rte_eth_devices[pid].device) {
> >>> + if (!strcmp(device_name,
> >>> + rte_eth_devices[pid].device->name)) {
> >>> + _rte_eth_dev_callback_process(
> >>> + &rte_eth_devices[pid],
> >>> + RTE_ETH_EVENT_INTR_RMV,
> >>> NULL);
> >>> + continue;
> >>> + }
> >>> + }
> >>> + }
> >>> + break;
> >>> + case RTE_DEV_EVENT_ADD:
> >>> + RTE_LOG(INFO, EAL, "The device: %s has been added!\n",
> >>> + device_name);
> >>> + break;
> >>> + default:
> >>> + break;
> >>> + }
> >>> +}
> >> I don't get the point. Looks like this's a very common rte code. Why
> >> is it put in ixgbe pmd?
> > Jeff needs to detect if the removed device is related to this PMD, than to
> raise RMV events for all this PMD ethdev associated ports.
> > He should not raise RMV events for other PMD  ports.
> >
> 
> It should be like wenzhuo said that i could no strong reason to let common
> way in ixgbe pmd.  And sure raise RMV events for none related PMD ports is
> not my hope.
> Will plan to let it go into the eth dev layer to process it.
> 

How can you run ethdev function from EAL context?
How can the ethdev layer know which ports are related to the EAL device removal?
How can ethdev layer know if the port supports removal?



Re: [dpdk-dev] [PATCH v2] librte_lpm: Improve performance of the delete and add functions

2018-07-09 Thread Bruce Richardson
On Fri, Jul 06, 2018 at 07:59:22PM +0300, Alex Kiselev wrote:
> Please see inline replies
> 
> > On Mon, Jul 02, 2018 at 07:42:11PM +0300, Alex Kiselev wrote:
> >> There are two major problems with the library:
> >> first, there is no need to rebuild the whole LPM tree
> >> when a rule is deleted and second, due to the current
> >> rules algorithm with complexity O(n) it's almost
> >> impossible to deal with large rule sets (50k or so rules).
> >> This patch addresses those two issues.
> 
> >> Signed-off-by: Alex Kiselev 
> 
> > Hi,
> 
> > Some initial review comments inline below
> 
> > /Bruce
> >> ---
> >>  lib/librte_lpm/rte_lpm6.c | 1073 
> >> ++---
> >>  1 file changed, 816 insertions(+), 257 deletions(-)
> 

> >> +/*
> >> + * LPM6 rule hash function
> >> + */
> >> +static inline uint32_t
> >> +rule_hash_crc(const void *data, __rte_unused uint32_t data_len,
> >> +   uint32_t init_val)
> >> +{
> >> + return rte_hash_crc(data, sizeof(struct rte_lpm6_rule_key), 
> >> init_val);
> >> +}
> 
> > Why bother passing in the length and making the data a void pointer.
>  
> I beleive it should be compatible with the rte_hash_function prototype.

Ah, ok, you are passing this to rte_hash. Makes sense now. I suggest
putting in a comment explaining why you have the extra unused parameter so.



Re: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect in ixgbe

2018-07-09 Thread Jeff Guo




On 7/9/2018 5:04 PM, Matan Azrad wrote:

Hi

From: Jeff Guo

hi, wenzhuo and matan.


On 7/9/2018 3:51 PM, Matan Azrad wrote:

Hi

From: Lu, Wenzhuo

Hi Jeff,


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jeff Guo
Sent: Monday, July 9, 2018 2:57 PM
To: step...@networkplumber.org; Richardson, Bruce
; Yigit, Ferruh
; Ananyev, Konstantin
; gaetan.ri...@6wind.com; Wu,

Jingjing

; tho...@monjalon.net;

mo...@mellanox.com;

ma...@mellanox.com;

Van

Haaren, Harry ; Zhang, Qi Z
; He, Shaopeng ;
Iremonger, Bernard ;
arybche...@solarflare.com
Cc: jblu...@infradead.org; shreyansh.j...@nxp.com; dev@dpdk.org;
Guo, Jia ; Zhang, Helin 
Subject: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect
in ixgbe

This patch aim to enable hotplug detect in ixgbe pmd driver. Firstly
it set the flags RTE_PCI_DRV_INTR_RMV in drv_flags to announce the
hotplug ability, and then use rte_dev_event_callback_register to
register the hotplug event callback to eal. When eal detect the
hotplug event, it will call the callback to process it, if the event
is hotplug remove, it will trigger the RTE_ETH_EVENT_INTR_RMV event
into ethdev callback to let app process the hotplug for the ethdev.

This is an example for other driver, that if any driver support
hotplug feature could be use this way to enable hotplug detect.

Signed-off-by: Jeff Guo 
---
v2->v1:
refine some doc.
---
   drivers/net/ixgbe/ixgbe_ethdev.c | 46
+++-
   1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
b/drivers/net/ixgbe/ixgbe_ethdev.c
index 87d2ad0..83ce026 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1534,6 +1534,47 @@ generate_random_mac_addr(struct

ether_addr

*mac_addr)
memcpy(&mac_addr->addr_bytes[3], &random, 3);  }

+static void
+eth_dev_event_callback(char *device_name, enum

rte_dev_event_type

type,
+  __rte_unused void *arg)
+{
+   uint32_t pid;
+
+   if (type >= RTE_DEV_EVENT_MAX) {
+   fprintf(stderr, "%s called upon invalid event %d\n",
+   __func__, type);
+   fflush(stderr);
+   }
+
+   switch (type) {
+   case RTE_DEV_EVENT_REMOVE:
+   PMD_DRV_LOG(INFO, "The device: %s has been

removed!\n",

+   device_name);
+
+   if (!device_name)
+   return;
+
+   for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
+   if (rte_eth_devices[pid].device) {
+   if (!strcmp(device_name,
+   rte_eth_devices[pid].device->name)) {
+   _rte_eth_dev_callback_process(
+   &rte_eth_devices[pid],
+   RTE_ETH_EVENT_INTR_RMV,
NULL);
+   continue;
+   }
+   }
+   }
+   break;
+   case RTE_DEV_EVENT_ADD:
+   RTE_LOG(INFO, EAL, "The device: %s has been added!\n",
+   device_name);
+   break;
+   default:
+   break;
+   }
+}

I don't get the point. Looks like this's a very common rte code. Why
is it put in ixgbe pmd?

Jeff needs to detect if the removed device is related to this PMD, than to

raise RMV events for all this PMD ethdev associated ports.

He should not raise RMV events for other PMD  ports.


It should be like wenzhuo said that i could no strong reason to let common
way in ixgbe pmd.  And sure raise RMV events for none related PMD ports is
not my hope.
Will plan to let it go into the eth dev layer to process it.


How can you run ethdev function from EAL context?
How can the ethdev layer know which ports are related to the EAL device removal?
How can ethdev layer know if the port supports removal?


i mean that still let driver manage the callback , just let the common 
ethdev functional in ethdev layer.
It just define "rte_eth_dev_event_callback" in ethdev layer, and 
register the common ethdev callback in pmd driver as bellow. the eth_dev 
could be pass by the whole process.


rte_dev_event_callback_register(eth_dev->device->name,
rte_eth_dev_event_callback,
(void *)eth_dev);




Re: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect in ixgbe

2018-07-09 Thread Matan Azrad


Hi

From: Jeff Guo
> On 7/9/2018 5:04 PM, Matan Azrad wrote:
> > Hi
> >
> > From: Jeff Guo
> >> hi, wenzhuo and matan.
> >>
> >>
> >> On 7/9/2018 3:51 PM, Matan Azrad wrote:
> >>> Hi
> >>>
> >>> From: Lu, Wenzhuo
>  Hi Jeff,
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jeff Guo
> > Sent: Monday, July 9, 2018 2:57 PM
> > To: step...@networkplumber.org; Richardson, Bruce
> > ; Yigit, Ferruh
> > ; Ananyev, Konstantin
> > ; gaetan.ri...@6wind.com; Wu,
> >> Jingjing
> > ; tho...@monjalon.net;
> >> mo...@mellanox.com;
> > ma...@mellanox.com;
>  Van
> > Haaren, Harry ; Zhang, Qi Z
> > ; He, Shaopeng ;
> > Iremonger, Bernard ;
> > arybche...@solarflare.com
> > Cc: jblu...@infradead.org; shreyansh.j...@nxp.com; dev@dpdk.org;
> > Guo, Jia ; Zhang, Helin 
> > Subject: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug
> > detect in ixgbe
> >
> > This patch aim to enable hotplug detect in ixgbe pmd driver.
> > Firstly it set the flags RTE_PCI_DRV_INTR_RMV in drv_flags to
> > announce the hotplug ability, and then use
> > rte_dev_event_callback_register to register the hotplug event
> > callback to eal. When eal detect the hotplug event, it will call
> > the callback to process it, if the event is hotplug remove, it
> > will trigger the RTE_ETH_EVENT_INTR_RMV event into ethdev
> callback to let app process the hotplug for the ethdev.
> >
> > This is an example for other driver, that if any driver support
> > hotplug feature could be use this way to enable hotplug detect.
> >
> > Signed-off-by: Jeff Guo 
> > ---
> > v2->v1:
> > refine some doc.
> > ---
> >drivers/net/ixgbe/ixgbe_ethdev.c | 46
> > +++-
> >1 file changed, 45 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > b/drivers/net/ixgbe/ixgbe_ethdev.c
> > index 87d2ad0..83ce026 100644
> > --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > @@ -1534,6 +1534,47 @@ generate_random_mac_addr(struct
> >> ether_addr
> > *mac_addr)
> > memcpy(&mac_addr->addr_bytes[3], &random, 3);  }
> >
> > +static void
> > +eth_dev_event_callback(char *device_name, enum
> >> rte_dev_event_type
> > type,
> > +  __rte_unused void *arg)
> > +{
> > +   uint32_t pid;
> > +
> > +   if (type >= RTE_DEV_EVENT_MAX) {
> > +   fprintf(stderr, "%s called upon invalid event %d\n",
> > +   __func__, type);
> > +   fflush(stderr);
> > +   }
> > +
> > +   switch (type) {
> > +   case RTE_DEV_EVENT_REMOVE:
> > +   PMD_DRV_LOG(INFO, "The device: %s has been
>  removed!\n",
> > +   device_name);
> > +
> > +   if (!device_name)
> > +   return;
> > +
> > +   for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
> > +   if (rte_eth_devices[pid].device) {
> > +   if (!strcmp(device_name,
> > +   rte_eth_devices[pid].device-
> >name)) {
> > +
>   _rte_eth_dev_callback_process(
> > +
>   &rte_eth_devices[pid],
> > +
>   RTE_ETH_EVENT_INTR_RMV,
> > NULL);
> > +   continue;
> > +   }
> > +   }
> > +   }
> > +   break;
> > +   case RTE_DEV_EVENT_ADD:
> > +   RTE_LOG(INFO, EAL, "The device: %s has been
> added!\n",
> > +   device_name);
> > +   break;
> > +   default:
> > +   break;
> > +   }
> > +}
>  I don't get the point. Looks like this's a very common rte code.
>  Why is it put in ixgbe pmd?
> >>> Jeff needs to detect if the removed device is related to this PMD,
> >>> than to
> >> raise RMV events for all this PMD ethdev associated ports.
> >>> He should not raise RMV events for other PMD  ports.
> >>>
> >> It should be like wenzhuo said that i could no strong reason to let
> >> common way in ixgbe pmd.  And sure raise RMV events for none related
> >> PMD ports is not my hope.
> >> Will plan to let it go into the eth dev layer to process it.
> >>
> > How can you run ethdev function from EAL context?
> > How can the ethdev layer know which ports are related to the EAL device
> removal?
> > How can ethdev layer know if the port supports removal?
> 
> i mean that still let driver manage the callback , just let the common ethdev
> functional in ethdev layer.
> It just define "rte_eth_dev_event_callback" in ethdev layer, and register the
> common ethdev cal

Re: [dpdk-dev] [PATCH] maintainers: update for Mellanox PMDs

2018-07-09 Thread Matan Azrad


From: Adrien Mazarguil
> Shahaf and Matan volunteered to replace Nélio and myself as maintainers
> for
> mlx4 and mlx5 PMDs. Cheers!
> 

Thanks!

> Signed-off-by: Adrien Mazarguil 
> Signed-off-by: Nelio Laranjeiro 
> Cc: sta...@dpdk.org
> Cc: Shahaf Shuler 
> Cc: Matan Azrad 

Acked-by: Matan Azrad 

> ---
>  MAINTAINERS | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index dabb12d65..e94f02386 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -580,15 +580,15 @@ F: doc/guides/nics/mvpp2.rst
>  F: doc/guides/nics/features/mvpp2.ini
> 
>  Mellanox mlx4
> -M: Adrien Mazarguil 
> +M: Matan Azrad 
> +M: Shahaf Shuler 
>  T: git://dpdk.org/next/dpdk-next-net-mlx
>  F: drivers/net/mlx4/
>  F: doc/guides/nics/mlx4.rst
>  F: doc/guides/nics/features/mlx4.ini
> 
>  Mellanox mlx5
> -M: Adrien Mazarguil 
> -M: Nelio Laranjeiro 
> +M: Shahaf Shuler 
>  M: Yongseok Koh 
>  T: git://dpdk.org/next/dpdk-next-net-mlx
>  F: drivers/net/mlx5/
> --
> 2.11.0


Re: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug detect in ixgbe

2018-07-09 Thread Jeff Guo




On 7/9/2018 6:01 PM, Matan Azrad wrote:

Hi

From: Jeff Guo

On 7/9/2018 5:04 PM, Matan Azrad wrote:

Hi

From: Jeff Guo

hi, wenzhuo and matan.


On 7/9/2018 3:51 PM, Matan Azrad wrote:

Hi

From: Lu, Wenzhuo

Hi Jeff,


-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jeff Guo
Sent: Monday, July 9, 2018 2:57 PM
To: step...@networkplumber.org; Richardson, Bruce
; Yigit, Ferruh
; Ananyev, Konstantin
; gaetan.ri...@6wind.com; Wu,

Jingjing

; tho...@monjalon.net;

mo...@mellanox.com;

ma...@mellanox.com;

Van

Haaren, Harry ; Zhang, Qi Z
; He, Shaopeng ;
Iremonger, Bernard ;
arybche...@solarflare.com
Cc: jblu...@infradead.org; shreyansh.j...@nxp.com; dev@dpdk.org;
Guo, Jia ; Zhang, Helin 
Subject: [dpdk-dev] [PATCH v2 1/3] net/ixgbe: enable hotplug
detect in ixgbe

This patch aim to enable hotplug detect in ixgbe pmd driver.
Firstly it set the flags RTE_PCI_DRV_INTR_RMV in drv_flags to
announce the hotplug ability, and then use
rte_dev_event_callback_register to register the hotplug event
callback to eal. When eal detect the hotplug event, it will call
the callback to process it, if the event is hotplug remove, it
will trigger the RTE_ETH_EVENT_INTR_RMV event into ethdev

callback to let app process the hotplug for the ethdev.

This is an example for other driver, that if any driver support
hotplug feature could be use this way to enable hotplug detect.

Signed-off-by: Jeff Guo 
---
v2->v1:
refine some doc.
---
drivers/net/ixgbe/ixgbe_ethdev.c | 46
+++-
1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
b/drivers/net/ixgbe/ixgbe_ethdev.c
index 87d2ad0..83ce026 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1534,6 +1534,47 @@ generate_random_mac_addr(struct

ether_addr

*mac_addr)
memcpy(&mac_addr->addr_bytes[3], &random, 3);  }

+static void
+eth_dev_event_callback(char *device_name, enum

rte_dev_event_type

type,
+  __rte_unused void *arg)
+{
+   uint32_t pid;
+
+   if (type >= RTE_DEV_EVENT_MAX) {
+   fprintf(stderr, "%s called upon invalid event %d\n",
+   __func__, type);
+   fflush(stderr);
+   }
+
+   switch (type) {
+   case RTE_DEV_EVENT_REMOVE:
+   PMD_DRV_LOG(INFO, "The device: %s has been

removed!\n",

+   device_name);
+
+   if (!device_name)
+   return;
+
+   for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) {
+   if (rte_eth_devices[pid].device) {
+   if (!strcmp(device_name,
+   rte_eth_devices[pid].device-

name)) {

+

_rte_eth_dev_callback_process(

+

&rte_eth_devices[pid],

+

RTE_ETH_EVENT_INTR_RMV,

NULL);
+   continue;
+   }
+   }
+   }
+   break;
+   case RTE_DEV_EVENT_ADD:
+   RTE_LOG(INFO, EAL, "The device: %s has been

added!\n",

+   device_name);
+   break;
+   default:
+   break;
+   }
+}

I don't get the point. Looks like this's a very common rte code.
Why is it put in ixgbe pmd?

Jeff needs to detect if the removed device is related to this PMD,
than to

raise RMV events for all this PMD ethdev associated ports.

He should not raise RMV events for other PMD  ports.


It should be like wenzhuo said that i could no strong reason to let
common way in ixgbe pmd.  And sure raise RMV events for none related
PMD ports is not my hope.
Will plan to let it go into the eth dev layer to process it.


How can you run ethdev function from EAL context?
How can the ethdev layer know which ports are related to the EAL device

removal?

How can ethdev layer know if the port supports removal?

i mean that still let driver manage the callback , just let the common ethdev
functional in ethdev layer.
It just define "rte_eth_dev_event_callback" in ethdev layer, and register the
common ethdev callback in pmd driver as bellow. the eth_dev could be pass
by the whole process.

  rte_dev_event_callback_register(eth_dev->device->name,
  rte_eth_dev_event_callback,
  (void *)eth_dev);


Sorry, but I don't understand, can you explain step by step the notification 
path?


the step should be:
1) add a ethdev driver api "rte_dev_event_callback_register" in the 
rte_ethdev_driver.h, let pmd driver call it.
 rte_eth_dev_event_callback(char *device_name, enum 
rte_dev_event_type event, void *cb_arg);


2) register eth eal device event callback in pmd driver as below, the 
rte eth (eth_dev) could be set to cb_arg of the callback.


rte_dev_event_callback_register(eth_dev->device->name,
 rte_eth_dev_event_callback,
 (void *)

[dpdk-dev] [PATCH v5] net/mlx4: support hardware TSO

2018-07-09 Thread Moti Haimovsky
Implement support for hardware TSO.

Signed-off-by: Moti Haimovsky 
---
v5:
* Modification to the code according to review inputs from Matan
  Azrad.
* Code optimization to the TSO header copy routine.
* Rearranged the TSO data-segments creation routine.
in reply to 
1530715998-15703-1-git-send-email-mo...@mellanox.com

v4:
* Bug fixes in filling TSO data segments.
* Modifications according to review inputs from Adrien Mazarguil
  and Matan Azrad.
in reply to
1530190137-17848-1-git-send-email-mo...@mellanox.com

v3:
* Fixed compilation errors in compilers without GNU C extensions
  caused by a declaration of zero-length array in the code.
in reply to
1530187032-6489-1-git-send-email-mo...@mellanox.com

v2:
* Fixed coding style warning.
in reply to
1530184583-30166-1-git-send-email-mo...@mellanox.com

v1:
* Fixed coding style warnings.
in reply to
1530181779-19716-1-git-send-email-mo...@mellanox.com
---
 doc/guides/nics/features/mlx4.ini |   1 +
 doc/guides/nics/mlx4.rst  |   3 +
 drivers/net/mlx4/Makefile |   5 +
 drivers/net/mlx4/mlx4.c   |   9 +
 drivers/net/mlx4/mlx4.h   |   5 +
 drivers/net/mlx4/mlx4_prm.h   |  15 ++
 drivers/net/mlx4/mlx4_rxtx.c  | 372 +-
 drivers/net/mlx4/mlx4_rxtx.h  |   2 +-
 drivers/net/mlx4/mlx4_txq.c   |   8 +-
 9 files changed, 416 insertions(+), 4 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini 
b/doc/guides/nics/features/mlx4.ini
index f6efd21..98a3f61 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -13,6 +13,7 @@ Queue start/stop = Y
 MTU update   = Y
 Jumbo frame  = Y
 Scattered Rx = Y
+TSO  = Y
 Promiscuous mode = Y
 Allmulticast mode= Y
 Unicast MAC filter   = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 491106a..12adaeb 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -142,6 +142,9 @@ Limitations
   The ability to enable/disable CRC stripping requires OFED version
   4.3-1.5.0.0 and above  or rdma-core version v18 and above.
 
+- TSO (Transmit Segmentation Offload) is supported in OFED version
+  4.4 and above or in rdma-core version v18 and above.
+
 Prerequisites
 -
 
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 73f9d40..63bc003 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -85,6 +85,11 @@ mlx4_autoconf.h.new: FORCE
 mlx4_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
$Q $(RM) -f -- '$@'
$Q : > '$@'
+   $Q sh -- '$<' '$@' \
+   HAVE_IBV_MLX4_WQE_LSO_SEG \
+   infiniband/mlx4dv.h \
+   type 'struct mlx4_wqe_lso_seg' \
+   $(AUTOCONF_OUTPUT)
 
 # Create mlx4_autoconf.h or update it in case it differs from the new one.
 
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index d151a90..5d8c76d 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -677,6 +677,15 @@ struct mlx4_conf {
IBV_RAW_PACKET_CAP_SCATTER_FCS);
DEBUG("FCS stripping toggling is %ssupported",
  priv->hw_fcs_strip ? "" : "not ");
+   priv->tso =
+   ((device_attr_ex.tso_caps.max_tso > 0) &&
+(device_attr_ex.tso_caps.supported_qpts &
+ (1 << IBV_QPT_RAW_PACKET)));
+   if (priv->tso)
+   priv->tso_max_payload_sz =
+   device_attr_ex.tso_caps.max_tso;
+   DEBUG("TSO is %ssupported",
+ priv->tso ? "" : "not ");
/* Configure the first MAC address by default. */
err = mlx4_get_mac(priv, &mac.addr_bytes);
if (err) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 300cb4d..89d8c38 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -47,6 +47,9 @@
 /** Interrupt alarm timeout value in microseconds. */
 #define MLX4_INTR_ALARM_TIMEOUT 10
 
+/* Maximum packet headers size (L2+L3+L4) for TSO. */
+#define MLX4_MAX_TSO_HEADER 192
+
 /** Port parameter. */
 #define MLX4_PMD_PORT_KVARG "port"
 
@@ -90,6 +93,8 @@ struct priv {
uint32_t hw_csum:1; /**< Checksum offload is supported. */
uint32_t hw_csum_l2tun:1; /**< Checksum support for L2 tunnels. */
uint32_t hw_fcs_strip:1; /**< FCS stripping toggling is supported. */
+   uint32_t tso:1; /**< Transmit segmentation offload is supported. */
+   uint32_t tso_max_payload_sz; /**< Max supported TSO payload size. */
uint64_t hw_rss_sup; /**< Supported RSS hash fields (Verbs format). */
struct rte_intr_handle intr_handle; /**< Port interrupt handle. */
struct mlx4_drop *drop; /**< Shared resources for drop flow rules. */
diff --git a/drivers/net/mlx4/mlx4_prm.h b/drivers/net/

Re: [dpdk-dev] [PATCH v5 09/16] cryptodev: remove max number of sessions parameter

2018-07-09 Thread De Lara Guarch, Pablo
Hi Akhil,

> -Original Message-
> From: De Lara Guarch, Pablo
> Sent: Thursday, July 5, 2018 3:08 AM
> To: Doherty, Declan ; akhil.go...@nxp.com;
> shally.ve...@caviumnetworks.com; ravi1.ku...@amd.com;
> jerin.ja...@caviumnetworks.com; Zhang, Roy Fan ;
> Trahe, Fiona ; t...@semihalf.com;
> jianjay.z...@huawei.com
> Cc: dev@dpdk.org; De Lara Guarch, Pablo 
> Subject: [PATCH v5 09/16] cryptodev: remove max number of sessions
> parameter
> 
> Most crypto PMDs do not have a limitation of the number of the sessions that
> can be handled internally. The value that was set before was not actually 
> used at
> all, since the sessions are created at the application level.
> Therefore, this value is not parsed from the initial crypto parameters anymore
> and it is set to 0, meaning that there is no actual limit.
> 
> Signed-off-by: Pablo de Lara 

Is this patch ok to you? It's the only one (apart from the MVSAM patch) that 
needs an ack.

Thanks,
Pablo


Re: [dpdk-dev] [PATCH v2] librte_lpm: Improve performance of the delete and add functions

2018-07-09 Thread Bruce Richardson
On Mon, Jul 02, 2018 at 07:42:11PM +0300, Alex Kiselev wrote:
> There are two major problems with the library:
> first, there is no need to rebuild the whole LPM tree
> when a rule is deleted and second, due to the current
> rules algorithm with complexity O(n) it's almost
> impossible to deal with large rule sets (50k or so rules).
> This patch addresses those two issues.
> 
> Signed-off-by: Alex Kiselev 
> ---
>  lib/librte_lpm/rte_lpm6.c | 1073 
> ++---
>  1 file changed, 816 insertions(+), 257 deletions(-)
> 



> @@ -806,156 +1188,333 @@ MAP_STATIC_SYMBOL(int 
> rte_lpm6_is_rule_present(struct rte_lpm6 *lpm,
>  /*
>   * Delete a rule from the rule table.
>   * NOTE: Valid range for depth parameter is 1 .. 128 inclusive.
> + * return
> + *   0 if successful delete
> + *   <0 if failure

whitespace in comment.

>   */
> -static inline void
> -rule_delete(struct rte_lpm6 *lpm, int32_t rule_index)
> +static inline int
> +rule_delete(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth)
>  {
> - /*
> -  * Overwrite redundant rule with last rule in group and decrement rule
> -  * counter.
> -  */
> - lpm->rules_tbl[rule_index] = lpm->rules_tbl[lpm->used_rules-1];
> - lpm->used_rules--;
> + /* init a rule key */
> + struct rte_lpm6_rule_key rule_key;
> + rule_key_init(&rule_key, ip, depth);
> +
> + /* Look for a rule */
> + struct rte_lpm6_rule*rule;

nit: whitespace

> + int ret = rte_hash_lookup_data(lpm->rules_tbl, (void *) &rule_key,
> + (void **) &rule);
> + if (ret >= 0) {
> + /* delete the rule */
> + rte_hash_del_key(lpm->rules_tbl, (void *) &rule_key);
> + lpm->used_rules--;
> + rte_mempool_put(lpm->rules_pool, rule);
> + }

Rather than doing a lookup and then delete, why not just try the delete
straight off. If you want to check for the key not being present, it can be
detected from the output of the delete call. From rte_hash.h:

 * @return
 *   - -EINVAL if the parameters are invalid.
 *   - -ENOENT if the key is not found.


> +
> + return ret;
>  }
>  
>  /*
> - * Deletes a rule
> + * Deletes a group of rules

Include a comment that this bulk function will rebuild the lpm table,
rather than doing incremental updates like the regular delete function.

>   */
>  int
> -rte_lpm6_delete(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth)
> +rte_lpm6_delete_bulk_func(struct rte_lpm6 *lpm,
> + uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE], uint8_t *depths,
> + unsigned n)
>  {
> - int32_t rule_to_delete_index;
> - uint8_t ip_masked[RTE_LPM6_IPV6_ADDR_SIZE];
> - unsigned i;
> -
> - /*
> -  * Check input arguments.
> -  */
> - if ((lpm == NULL) || (depth < 1) || (depth > RTE_LPM6_MAX_DEPTH)) {
> + /* Check input arguments. */
> + if ((lpm == NULL) || (ips == NULL) || (depths == NULL))
>   return -EINVAL;
> - }
> -
> - /* Copy the IP and mask it to avoid modifying user's input data. */
> - memcpy(ip_masked, ip, RTE_LPM6_IPV6_ADDR_SIZE);
> - mask_ip(ip_masked, depth);
> -
> - /*
> -  * Find the index of the input rule, that needs to be deleted, in the
> -  * rule table.
> -  */
> - rule_to_delete_index = rule_find(lpm, ip_masked, depth);
> -
> - /*
> -  * Check if rule_to_delete_index was found. If no rule was found the
> -  * function rule_find returns -ENOENT.
> -  */
> - if (rule_to_delete_index < 0)
> - return rule_to_delete_index;
>  
> - /* Delete the rule from the rule table. */
> - rule_delete(lpm, rule_to_delete_index);
> + unsigned i;
> + for (i = 0; i < n; i++)
> + rule_delete(lpm, ips[i], depths[i]);
>  
>   /*
>* Set all the table entries to 0 (ie delete every rule
>* from the data structure.
>*/
> - lpm->next_tbl8 = 0;
>   memset(lpm->tbl24, 0, sizeof(lpm->tbl24));
>   memset(lpm->tbl8, 0, sizeof(lpm->tbl8[0])
>   * RTE_LPM6_TBL8_GROUP_NUM_ENTRIES * lpm->number_tbl8s);
> + tbl8_pool_init(lpm);
>  
>   /*
> -  * Add every rule again (except for the one that was removed from
> +  * Add every rule again (except for the ones that were removed from
>* the rules table).
>*/
> - for (i = 0; i < lpm->used_rules; i++) {
> - rte_lpm6_add(lpm, lpm->rules_tbl[i].ip, lpm->rules_tbl[i].depth,
> - lpm->rules_tbl[i].next_hop);
> - }
> + recreate_lpm(lpm);
>  
>   return 0;
>  }
>  
>  /*
> - * Deletes a group of rules
> + * Delete all rules from the LPM table.
>   */
> -int
> -rte_lpm6_delete_bulk_func(struct rte_lpm6 *lpm,
> - uint8_t ips[][RTE_LPM6_IPV6_ADDR_SIZE], uint8_t *depths, 
> unsigned n)
> +void
> +rte_lpm6_delete_all(struct rte_lpm6 *lpm)
>  {
> - int32_t rule_to_delete_index;
> - uint8_t ip_masked[RTE_LPM6_I

Re: [dpdk-dev] [PATCH v3 1/8] hash: fix multiwriter lock memory allocation

2018-07-09 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Wang, Yipeng1
> Sent: Friday, July 6, 2018 8:47 PM
> To: De Lara Guarch, Pablo 
> Cc: dev@dpdk.org; Wang, Yipeng1 ; Richardson,
> Bruce ; honnappa.nagaraha...@arm.com;
> vgu...@caviumnetworks.com; brijesh.s.si...@gmail.com
> Subject: [PATCH v3 1/8] hash: fix multiwriter lock memory allocation
> 
> When malloc for multiwriter_lock, the align should be RTE_CACHE_LINE_SIZE
> rather than LCORE_CACHE_SIZE.
> 
> Also there should be check to verify the success of rte_malloc.
> 
> Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel 
> TSX")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Yipeng Wang 

Acked-by: Pablo de Lara 



[dpdk-dev] [PATCH v3 0/4] Enable eal hotplug event detect for i40e/ixgbe

2018-07-09 Thread Jeff Guo
As we may know, we have eal event for rte device hotplug and ethdev event
for ethdev hotplug. Some ethdev need to use eal event to detect hotplug
behaviors, the privors way is register eal event callback in app, but seems
that it will have some race between these 2 event processes. In oder to fix
the it, it might be better to find a way to combind these 2 events detect.

This patch set introduce a way to combind these 2 event, by register the
ethdev eal event callback in pmd driver and trigger the ethdev hotplug event
in the callback. That will let the ethdev device can easy process hotplug
by a common way.

Here let i40e/ixgbe pmd driver for example, other driver which support
hotplug feature could be use this way to detect and process hotplug.

patch history:
v3->v2:
remove the callback from driver to ethdev for common.

v2->v1:
add ixgbe hotplug detect case.
refine some doc.


Jeff Guo (4):
  ethdev: Add eal device event callback
  net/ixgbe: enable hotplug detect in ixgbe
  net/i40e: enable hotplug detect in i40e
  testpmd: remove the dev event callback register

 app/test-pmd/testpmd.c | 76 --
 doc/guides/rel_notes/release_18_08.rst |  8 
 drivers/net/i40e/i40e_ethdev.c |  7 +++-
 drivers/net/ixgbe/ixgbe_ethdev.c   |  7 +++-
 lib/librte_ethdev/rte_ethdev.c | 37 +
 lib/librte_ethdev/rte_ethdev_driver.h  | 20 +
 6 files changed, 77 insertions(+), 78 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH v3 1/4] ethdev: Add eal device event callback

2018-07-09 Thread Jeff Guo
Implement a eal device event callback "rte_eth_dev_event_callback"
in ethdev, it could let pmd driver have chance to manage the eal
device event, such as process hotplug event.

Signed-off-by: Jeff Guo 
---
v3->v2:
add new callback in ethdev
---
 doc/guides/rel_notes/release_18_08.rst |  8 
 lib/librte_ethdev/rte_ethdev.c | 37 ++
 lib/librte_ethdev/rte_ethdev_driver.h  | 20 ++
 3 files changed, 65 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_08.rst 
b/doc/guides/rel_notes/release_18_08.rst
index bc01242..2326058 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -46,6 +46,14 @@ New Features
   Flow API support has been added to CXGBE Poll Mode Driver to offload
   flows to Chelsio T5/T6 NICs.
 
+* **Added eal device event callback in ethdev for hotplug.**
+
+  Implement a eal device event callback in ethdev, it could let pmd driver
+  have chance to manage the eal device event, such as process hotplug event.
+
+  * ``rte_eth_dev_event_callback`` for driver use to register it and process
+eal device event.
+
 
 API Changes
 ---
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index a9977df..36f218a 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -4518,6 +4518,43 @@ rte_eth_devargs_parse(const char *dargs, struct 
rte_eth_devargs *eth_da)
return result;
 }
 
+void __rte_experimental
+rte_eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
+void *arg)
+{
+   struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)arg;
+
+   if (type >= RTE_DEV_EVENT_MAX) {
+   fprintf(stderr, "%s called upon invalid event %d\n",
+   __func__, type);
+   fflush(stderr);
+   }
+
+   switch (type) {
+   case RTE_DEV_EVENT_REMOVE:
+   ethdev_log(INFO, "The device: %s has been removed!\n",
+   device_name);
+
+   if (!device_name || !eth_dev)
+   return;
+
+   if (!(eth_dev->data->dev_flags & RTE_ETH_EVENT_INTR_RMV))
+   return;
+
+   if (!strcmp(device_name, eth_dev->device->name))
+   _rte_eth_dev_callback_process(eth_dev,
+ RTE_ETH_EVENT_INTR_RMV,
+ NULL);
+   break;
+   case RTE_DEV_EVENT_ADD:
+   ethdev_log(INFO, "The device: %s has been added!\n",
+   device_name);
+   break;
+   default:
+   break;
+   }
+}
+
 RTE_INIT(ethdev_init_log);
 static void
 ethdev_init_log(void)
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h 
b/lib/librte_ethdev/rte_ethdev_driver.h
index c9c825e..fed5afa 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -82,6 +82,26 @@ int rte_eth_dev_release_port(struct rte_eth_dev *eth_dev);
 void _rte_eth_dev_reset(struct rte_eth_dev *dev);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Implement a rte eth eal device event callbacks for the specific device.
+ *
+ * @param device_name
+ *  Pointer to the name of the rte device.
+ * @param event
+ *  Eal device event type.
+ * @param ret_param
+ *  To pass data back to user application.
+ *
+ * @return
+ *  void
+ */
+void __rte_experimental
+rte_eth_dev_event_callback(char *device_name,
+   enum rte_dev_event_type event, void *cb_arg);
+
+/**
  * @internal Executes all the user application registered callbacks for
  * the specific device. It is for DPDK internal user only. User
  * application should not call it directly.
-- 
2.7.4



[dpdk-dev] [PATCH v3 2/4] net/ixgbe: enable hotplug detect in ixgbe

2018-07-09 Thread Jeff Guo
This patch aim to enable hotplug detect in ixgbe pmd driver. Firstly it
set the flags RTE_PCI_DRV_INTR_RMV in drv_flags to announce the hotplug
ability, and then use rte_dev_event_callback_register to register
the hotplug event callback to eal. When eal detect the hotplug event,
it will call the callback to process it, if the event is hotplug remove,
it will trigger the RTE_ETH_EVENT_INTR_RMV event into ethdev callback
to let app process the hotplug for the ethdev.

This is an example for other driver, that if any driver support hotplug
feature could be use this way to enable hotplug detect.

Signed-off-by: Jeff Guo 
---
v3->v2:
remove the callback from driver to ethdev.
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 87d2ad0..a1c2588 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1678,6 +1678,11 @@ eth_ixgbevf_dev_init(struct rte_eth_dev *eth_dev)
rte_intr_enable(intr_handle);
ixgbevf_intr_enable(eth_dev);
 
+   /* register the device event callback */
+   rte_dev_event_callback_register(eth_dev->device->name,
+   rte_eth_dev_event_callback,
+   (void *)eth_dev);
+
PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x mac.type=%s",
 eth_dev->data->port_id, pci_dev->id.vendor_id,
 pci_dev->id.device_id, "ixgbe_mac_82599_vf");
@@ -1801,7 +1806,7 @@ static int eth_ixgbe_pci_remove(struct rte_pci_device 
*pci_dev)
 static struct rte_pci_driver rte_ixgbe_pmd = {
.id_table = pci_id_ixgbe_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
-RTE_PCI_DRV_IOVA_AS_VA,
+RTE_PCI_DRV_IOVA_AS_VA | RTE_PCI_DRV_INTR_RMV,
.probe = eth_ixgbe_pci_probe,
.remove = eth_ixgbe_pci_remove,
 };
-- 
2.7.4



[dpdk-dev] [PATCH v3 3/4] net/i40e: enable hotplug detect in i40e

2018-07-09 Thread Jeff Guo
This patch aim to enable hotplug detect in i40e pmd driver. Firstly it
set the flags RTE_PCI_DRV_INTR_RMV in drv_flags to announce the hotplug
ability, and then use rte_dev_event_callback_register to register
the ethdev eal device event callback. When eal detect the hotplug event,
it will call the callback to process it, if the event is hotplug remove,
it will trigger the RTE_ETH_EVENT_INTR_RMV event into ethdev callback
to let app process the hotplug for the ethdev.

This is an example for other driver, that if any driver support hotplug
feature could be use this way to enable hotplug detect.

Signed-off-by: Jeff Guo 
---
v3->v2:
remove the callback from driver to ethdev.
---
 drivers/net/i40e/i40e_ethdev.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 13c5d32..d79cac1 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -688,7 +688,7 @@ static int eth_i40e_pci_remove(struct rte_pci_device 
*pci_dev)
 static struct rte_pci_driver rte_i40e_pmd = {
.id_table = pci_id_i40e_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC |
-RTE_PCI_DRV_IOVA_AS_VA,
+RTE_PCI_DRV_IOVA_AS_VA | RTE_PCI_DRV_INTR_RMV,
.probe = eth_i40e_pci_probe,
.remove = eth_i40e_pci_remove,
 };
@@ -1442,6 +1442,11 @@ eth_i40e_dev_init(struct rte_eth_dev *dev, void 
*init_params __rte_unused)
rte_intr_callback_register(intr_handle,
   i40e_dev_interrupt_handler, dev);
 
+   /* register the device event callback */
+   rte_dev_event_callback_register(dev->device->name,
+   rte_eth_dev_event_callback,
+   (void *)dev);
+
/* configure and enable device interrupt */
i40e_pf_config_irq0(hw, TRUE);
i40e_pf_enable_irq0(hw);
-- 
2.7.4



[dpdk-dev] [PATCH v3 4/4] testpmd: remove the dev event callback register

2018-07-09 Thread Jeff Guo
Since now we can use driver to management the eal event for hotplug,
so no need to register dev event callback in app anymore. This patch
remove the related code.

Signed-off-by: Jeff Guo 
Acked-by: Wenzhuo Lu 
---
v3->v2:
no change.
---
 app/test-pmd/testpmd.c | 76 --
 1 file changed, 76 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 24c1998..10ed660 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -400,12 +400,6 @@ static void check_all_ports_link_status(uint32_t 
port_mask);
 static int eth_event_callback(portid_t port_id,
  enum rte_eth_event_type type,
  void *param, void *ret_param);
-static void eth_dev_event_callback(char *device_name,
-   enum rte_dev_event_type type,
-   void *param);
-static int eth_dev_event_callback_register(void);
-static int eth_dev_event_callback_unregister(void);
-
 
 /*
  * Check if all the ports are started.
@@ -1915,39 +1909,6 @@ reset_port(portid_t pid)
printf("Done\n");
 }
 
-static int
-eth_dev_event_callback_register(void)
-{
-   int ret;
-
-   /* register the device event callback */
-   ret = rte_dev_event_callback_register(NULL,
-   eth_dev_event_callback, NULL);
-   if (ret) {
-   printf("Failed to register device event callback\n");
-   return -1;
-   }
-
-   return 0;
-}
-
-
-static int
-eth_dev_event_callback_unregister(void)
-{
-   int ret;
-
-   /* unregister the device event callback */
-   ret = rte_dev_event_callback_unregister(NULL,
-   eth_dev_event_callback, NULL);
-   if (ret < 0) {
-   printf("Failed to unregister device event callback\n");
-   return -1;
-   }
-
-   return 0;
-}
-
 void
 attach_port(char *identifier)
 {
@@ -2049,10 +2010,6 @@ pmd_test_exit(void)
RTE_LOG(ERR, EAL,
"fail to stop device event monitor.");
 
-   ret = eth_dev_event_callback_unregister();
-   if (ret)
-   RTE_LOG(ERR, EAL,
-   "fail to unregister all event callbacks.");
}
 
printf("\nBye...\n");
@@ -2191,37 +2148,6 @@ eth_event_callback(portid_t port_id, enum 
rte_eth_event_type type, void *param,
return 0;
 }
 
-/* This function is used by the interrupt thread */
-static void
-eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
-__rte_unused void *arg)
-{
-   if (type >= RTE_DEV_EVENT_MAX) {
-   fprintf(stderr, "%s called upon invalid event %d\n",
-   __func__, type);
-   fflush(stderr);
-   }
-
-   switch (type) {
-   case RTE_DEV_EVENT_REMOVE:
-   RTE_LOG(ERR, EAL, "The device: %s has been removed!\n",
-   device_name);
-   /* TODO: After finish failure handle, begin to stop
-* packet forward, stop port, close port, detach port.
-*/
-   break;
-   case RTE_DEV_EVENT_ADD:
-   RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
-   device_name);
-   /* TODO: After finish kernel driver binding,
-* begin to attach port.
-*/
-   break;
-   default:
-   break;
-   }
-}
-
 static int
 set_tx_queue_stats_mapping_registers(portid_t port_id, struct rte_port *port)
 {
@@ -2735,8 +2661,6 @@ main(int argc, char** argv)
rte_errno = EINVAL;
return -1;
}
-   eth_dev_event_callback_register();
-
}
 
if (start_port(RTE_PORT_ALL) != 0)
-- 
2.7.4



Re: [dpdk-dev] [PATCH v4 07/10] net/mlx5: probe all port representors

2018-07-09 Thread Shahaf Shuler
Hi Adrien,


Thursday, July 5, 2018 11:46 AM, Adrien Mazarguil:
> Subject: [PATCH v4 07/10] net/mlx5: probe all port representors
> 
> Probe existing port representors in addition to their master device and
> associate them automatically.
> 
> To avoid collision between Ethernet devices, they are named as follows:
> 
> - "{DBDF}" for master/switch devices.
> - "{DBDF}_representor_{rep}" with "rep" starting from 0 for port
>   representors.
> 
> (Patch based on prior work from Yuanhan Liu)
> 
> Signed-off-by: Adrien Mazarguil 
> Signed-off-by: Nelio Laranjeiro 
> Reviewed-by: Xueming Li 
> Cc: Xueming Li 
> Cc: Shahaf Shuler 
> --
> v4 changes:
> 
> - Fixed domain ID release once the last port using it is closed. Closed
>   devices are not necessarily detached, their presence is not a good
>   indicator. Code was modified to check if they still use their domain IDs
>   before deciding to release it.
> 
> v3 changes:
> 
> - Nelio introduced mlx5_dev_to_port_id() to prevent the master device
> from
>   releasing a domain ID while representors are still bound. It is now
>   released by the last device closed.
> - Reverted to original naming convention as requested by Xueming and
>   Shahaf; "net_" prefix and "_0" suffix were dropped.
> - mlx5_dev_spawn() (previously mlx5_dev_spawn_one()) now decides on
> its own
>   whether underlying device is a representor.
> - Devices can now be probed in any order and not necessarily all at once;
>   representors can exist without a master device.
> - mlx5_pci_probe() iterates on the list of devices directly instead of
>   relying on an intermediate function (previously mlx5_dev_spawn()).
> - mlx5_get_ifname() was rewritten to rely on mlx5_nl_ifindex() when faced
>   with a representor.
> - Since it is not necessarily present, master device is now dynamically
>   retrieved in mlx5_dev_infos_get().
> 
> v2 changes:
> 
> - Added representor information to dev_infos_get(). DPDK port ID of master
>   device is now stored in the private structure to retrieve it
>   conveniently.
> - Master device is assigned dummy representor ID value -1 to better
>   distinguish from the the first actual representor reported by
>   dev_infos_get() as those are indexed from 0.
> - Added RTE_ETH_DEV_REPRESENTOR device flag.
> ---
>  drivers/net/mlx5/mlx5.c| 134 ---
> -
>  drivers/net/mlx5/mlx5.h|  12 +++-
>  drivers/net/mlx5/mlx5_ethdev.c | 133
> +++
>  drivers/net/mlx5/mlx5_mac.c|   2 +-
>  drivers/net/mlx5/mlx5_stats.c  |   6 +-
>  5 files changed, 238 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> d06ba9886..c02afbb82 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -307,7 +307,27 @@ mlx5_dev_close(struct rte_eth_dev *dev)
>   if (ret)
>   DRV_LOG(WARNING, "port %u some flows still remain",
>   dev->data->port_id);
> + if (priv->domain_id !=
> RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID) {
> + unsigned int c = 0;
> + unsigned int i = mlx5_dev_to_port_id(dev->device, NULL, 0);
> + uint16_t port_id[i];
> +
> + i = RTE_MIN(mlx5_dev_to_port_id(dev->device, port_id, i),
> i);
> + while (i--) {
> + struct priv *opriv =
> + rte_eth_devices[port_id[i]].data-
> >dev_private;
> +
> + if (!opriv ||
> + opriv->domain_id != priv->domain_id ||
> + &rte_eth_devices[port_id[i]] == dev)
> + continue;
> + ++c;
> + }
> + if (!c)
> + claim_zero(rte_eth_switch_domain_free(priv-
> >domain_id));
> + }
>   memset(priv, 0, sizeof(*priv));
> + priv->domain_id = RTE_ETH_DEV_SWITCH_DOMAIN_ID_INVALID;
>  }
> 
>  const struct eth_dev_ops mlx5_dev_ops = { @@ -647,6 +667,8 @@
> mlx5_uar_init_secondary(struct rte_eth_dev *dev)
>   *   Verbs device.
>   * @param vf
>   *   If nonzero, enable VF-specific features.
> + * @param[in] switch_info
> + *   Switch properties of Ethernet device.
>   *
>   * @return
>   *   A valid Ethernet device object on success, NULL otherwise and rte_errno
> @@ -655,7 +677,8 @@ mlx5_uar_init_secondary(struct rte_eth_dev *dev)
> static struct rte_eth_dev *  mlx5_dev_spawn(struct rte_device *dpdk_dev,
>  struct ibv_device *ibv_dev,
> -int vf)
> +int vf,
> +const struct mlx5_switch_info *switch_info)
>  {
>   struct ibv_context *ctx;
>   struct ibv_device_attr_ex attr;
> @@ -697,6 +720,8 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
> #endif
>   struct ether_addr mac;
>   char name[RTE_ETH_NAME_MAX_LEN];
> + int own_domain_id = 0;
> + unsigned int i;
> 
>   /* Prepare shared data between primary and secondary process. */
>   mlx5_prepare_shared_d

Re: [dpdk-dev] [PATCH v4 09/10] net/mlx5: add parameter for port representors

2018-07-09 Thread Shahaf Shuler
Thursday, July 5, 2018 11:46 AM, Adrien Mazarguil:
> Subject: [PATCH v4 09/10] net/mlx5: add parameter for port representors
> 
> Prior to this patch, all port representors detected on a given device were
> probed and Ethernet devices instantiated for each of them.
> 
> This patch adds support for the standard "representor" parameter, which
> implies that port representors are not probed by default anymore, except
> for the list provided through device arguments.
> 
> (Patch based on prior work from Yuanhan Liu)
> 
> Signed-off-by: Adrien Mazarguil 
> Reviewed-by: Xueming Li 
> --
> v3 changes:
> 
> - Adapted representor detection to the reworked mlx5_dev_spawn().
> 
> v2 changes:
> 
> - Added error message for when rte_eth_devargs_parse() fails.
> ---
>  doc/guides/nics/mlx5.rst| 12 
>  doc/guides/prog_guide/poll_mode_drv.rst |  2 ++
>  drivers/net/mlx5/mlx5.c | 41 ++--
>  3 files changed, 52 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> 7dd9c1c5e..0d0d21727 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -392,6 +392,18 @@ Run-time configuration
> 
>Disabled by default.
> 
> +- ``representor`` parameter [list]
> +
> +  This parameter can be used to instantiate DPDK Ethernet devices from
> + existing port (or VF) representors configured on the device.
> +
> +  It is a standard parameter whose format is described in
> + :ref:`ethernet_device_standard_device_arguments`.
> +
> +  For instance, to probe port representors 0 through 2::
> +
> +representor=[0-2]
> +
>  Firmware configuration
>  ~~
> 
> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> b/doc/guides/prog_guide/poll_mode_drv.rst
> index 4b69f6cbe..b2cf48354 100644
> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> @@ -360,6 +360,8 @@ Ethernet Device API
> 
>  The Ethernet device API exported by the Ethernet PMDs is described in the
> *DPDK API Reference*.
> 
> +.. _ethernet_device_standard_device_arguments:
> +
>  Ethernet Device Standard Device Arguments
> ~
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 6592480bf..12a77afa8 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -92,6 +92,9 @@
>  /* Activate Netlink support in VF mode. */  #define MLX5_VF_NL_EN
> "vf_nl_en"
> 
> +/* Select port representors to instantiate. */ #define MLX5_REPRESENTOR
> +"representor"
> +
>  #ifndef HAVE_IBV_MLX5_MOD_MPW
>  #define MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED (1 << 2)  #define
> MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW (1 << 3) @@ -443,6 +446,9
> @@ mlx5_args_check(const char *key, const char *val, void *opaque)
>   struct mlx5_dev_config *config = opaque;
>   unsigned long tmp;
> 
> + /* No-op, port representors are processed in mlx5_dev_spawn(). */
> + if (!strcmp(MLX5_REPRESENTOR, key))
> + return 0;
>   errno = 0;
>   tmp = strtoul(val, NULL, 0);
>   if (errno) {
> @@ -515,6 +521,7 @@ mlx5_args(struct mlx5_dev_config *config, struct
> rte_devargs *devargs)
>   MLX5_RX_VEC_EN,
>   MLX5_L3_VXLAN_EN,
>   MLX5_VF_NL_EN,
> + MLX5_REPRESENTOR,
>   NULL,
>   };
>   struct rte_kvargs *kvlist;
> @@ -672,7 +679,9 @@ mlx5_uar_init_secondary(struct rte_eth_dev *dev)
>   *
>   * @return
>   *   A valid Ethernet device object on success, NULL otherwise and rte_errno
> - *   is set.
> + *   is set. The following error is defined:
> + *
> + *   EBUSY: device is not supposed to be spawned.
>   */
>  static struct rte_eth_dev *
>  mlx5_dev_spawn(struct rte_device *dpdk_dev, @@ -723,6 +732,26 @@
> mlx5_dev_spawn(struct rte_device *dpdk_dev,
>   int own_domain_id = 0;
>   unsigned int i;
> 
> + /* Determine if this port representor is supposed to be spawned. */
> + if (switch_info->representor && dpdk_dev->devargs) {
> + struct rte_eth_devargs eth_da;
> +
> + err = rte_eth_devargs_parse(dpdk_dev->devargs->args,
> ð_da);
> + if (err) {
> + rte_errno = -err;
> + DRV_LOG(ERR, "failed to process device arguments:
> %s",
> + strerror(rte_errno));
> + return NULL;
> + }
> + for (i = 0; i < eth_da.nb_representor_ports; ++i)
> + if (eth_da.representor_ports[i] ==
> + (uint16_t)switch_info->port_name)
> + break;
> + if (i == eth_da.nb_representor_ports) {
> + rte_errno = EBUSY;

Why EBUSY is the correct errno? Will another attempts to probe the device can 
be successful? 

> + return NULL;
> + }
> + }
>   /* Prepare shared data between primary and secondary process

[dpdk-dev] [PATCH v7 0/7] hotplug failure handle mechanism

2018-07-09 Thread Jeff Guo
As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. let testpmd for example:
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v7->v6:
delete some unuse

Re: [dpdk-dev] [PATCH v4 10/10] net/mlx5: support negative identifiers for port representors

2018-07-09 Thread Shahaf Shuler
Adrien, thank for this patch.

Thursday, July 5, 2018 11:46 AM, Adrien Mazarguil:
> Subject: [PATCH v4 10/10] net/mlx5: support negative identifiers for port
> representors
> 
> This patch brings support for BlueField representors.
> 
> Signed-off-by: Adrien Mazarguil 
> Cc: Shahaf Shuler 
> --
> v3 changes:
> 
> - This patch was not present in prior revisions.
> ---
>  drivers/net/mlx5/mlx5.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 12a77afa8..df7f39844 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -1330,6 +1330,14 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv
> __rte_unused,
>   memset(&list[i].info, 0, sizeof(list[i].info));
>   continue;
>   }
> + /*
> +  * Port representors not associated with any VFs (e.g. on
> +  * BlueField devices) report -1 as a port identifier.
> +  * Quietly set it to zero since DPDK only supports positive
> +  * values.
> +  */

I am waiting for the final answer from the BlueField team about the way they 
are going to enum the BlueField representors. 
In case it will be the same as x86 I think we can drop this patch, otherwise 
use it, agree?

> + if (list[i].info.representor && list[i].info.port_name == -1)
> + list[i].info.port_name = 0;
>   }
>   if (nl_rdma >= 0)
>   close(nl_rdma);
> --
> 2.11.0


[dpdk-dev] [PATCH v7 2/7] bus/pci: implement hotplug failure handler ops

2018-07-09 Thread Jeff Guo
This patch implements the ops of hotplug failure handler for PCI bus,
it is functional to remap a new dummy memory which overlap to the
failure memory to avoid MMIO read/write error.

Signed-off-by: Jeff Guo 
---
 drivers/bus/pci/pci_common.c | 28 
 drivers/bus/pci/pci_common_uio.c | 33 +
 drivers/bus/pci/private.h| 12 
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 94b0f41..d7abe6c 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -408,6 +408,33 @@ pci_find_device(const struct rte_device *start, 
rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hotplug_failure_handler(struct rte_device *dev)
+{
+   struct rte_pci_device *pdev = NULL;
+   int ret = 0;
+
+   pdev = RTE_DEV_TO_PCI(dev);
+   if (!pdev)
+   return -1;
+
+   switch (pdev->kdrv) {
+   case RTE_KDRV_IGB_UIO:
+   case RTE_KDRV_UIO_GENERIC:
+   case RTE_KDRV_NIC_UIO:
+   /* mmio resource is invalid, remap it to be safe. */
+   ret = pci_uio_remap_resource(pdev);
+   break;
+   default:
+   RTE_LOG(DEBUG, EAL,
+   "Not managed by a supported kernel driver, skipped\n");
+   ret = -1;
+   break;
+   }
+
+   return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -437,6 +464,7 @@ struct rte_pci_bus rte_pci_bus = {
.unplug = pci_unplug,
.parse = pci_parse,
.get_iommu_class = rte_pci_get_iommu_class,
+   .hotplug_failure_handler = pci_hotplug_failure_handler,
},
.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+   int i;
+   void *map_address;
+
+   if (dev == NULL)
+   return -1;
+
+   /* Remap all BARs */
+   for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+   /* skip empty BAR */
+   if (dev->mem_resource[i].phys_addr == 0)
+   continue;
+   map_address = mmap(dev->mem_resource[i].addr,
+   (size_t)dev->mem_resource[i].len,
+   PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+   if (map_address == MAP_FAILED) {
+   RTE_LOG(ERR, EAL,
+   "Cannot remap resource for device %s\n",
+   dev->name);
+   return -1;
+   }
+   RTE_LOG(INFO, EAL,
+   "Successful remap resource for device %s\n",
+   dev->name);
+   }
+
+   return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4



[dpdk-dev] [PATCH v7 3/7] bus: add sigbus handler

2018-07-09 Thread Jeff Guo
When device be hotplug out, if data path still read/write device, the
sigbus error will occur, this error need to be handled. So a handler
need to be here to capture the signal and handle it correspondingly.

This patch introduces a bus ops to handle sigbus error, it is a bus
specific behavior, so that each kind of bus can implement its own logic
case by case.

Signed-off-by: Jeff Guo 
---
 lib/librte_eal/common/include/rte_bus.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h 
b/lib/librte_eal/common/include/rte_bus.h
index e3a55a8..216ad1e 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void 
*addr);
 typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
 
 /**
+ * Implementation a specific sigbus handler, which is responsible for handle
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of hot unplug. When sigbus error be captured, it could
+ * call this function to handle sigbus error.
+ * @param failure_addr
+ * Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ * 0 for success handle the sigbus.
+ * 1 for no bus handle the sigbus.
+ * -1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -227,6 +242,8 @@ struct rte_bus {
rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
/**< handle hotplug failure on bus */
+   rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
+
 };
 
 /**
-- 
2.7.4



[dpdk-dev] [PATCH v7 1/7] bus: add hotplug failure handler

2018-07-09 Thread Jeff Guo
When device be hotplug out, if app still continue to access device by mmio,
it will cause of memory failure and result the system crash.

This patch introduces a bus ops to handle device hotplug failure, it is a
bus specific behavior, so each kind of bus can implement its own logic case
by case.

Signed-off-by: Jeff Guo 
---
 lib/librte_eal/common/include/rte_bus.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h 
b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..e3a55a8 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hotplug failure handler, which is responsible
+ * for handle the failure when the device be hotplug out from the bus. When
+ * hotplug removal event be detected, it could call this function to handle
+ * failure and guaranty the system would not crash in the case.
+ * @param dev
+ * Pointer of the device structure.
+ *
+ * @return
+ * 0 on success.
+ * !0 on error.
+ */
+typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -211,6 +225,8 @@ struct rte_bus {
rte_bus_parse_t parse;   /**< Parse a device name */
struct rte_bus_conf conf;/**< Bus configuration */
rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
+   rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
+   /**< handle hotplug failure on bus */
 };
 
 /**
-- 
2.7.4



[dpdk-dev] [PATCH v7 4/7] bus/pci: implement sigbus handler operation

2018-07-09 Thread Jeff Guo
This patch implements the ops of sigbus handler for PCI bus, it is
functional to find the corresponding pci device which is been hotplug
out, and then call the bus ops of hotplug failure handler to handle
the failure for the device.

Signed-off-by: Jeff Guo 
---
 drivers/bus/pci/pci_common.c | 49 
 1 file changed, 49 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d7abe6c..37ad266 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -407,6 +407,32 @@ pci_find_device(const struct rte_device *start, 
rte_dev_cmp_t cmp,
return NULL;
 }
 
+/* check the failure address belongs to which device. */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+   struct rte_pci_device *pdev = NULL;
+   int i;
+
+   FOREACH_DEVICE_ON_PCIBUS(pdev) {
+   for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+   if ((uint64_t)(uintptr_t)failure_addr >=
+   (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+   (uint64_t)(uintptr_t)failure_addr <
+   (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+   pdev->mem_resource[i].len) {
+   RTE_LOG(INFO, EAL, "Failure address "
+   "%16.16"PRIx64" belongs to "
+   "device %s!\n",
+   (uint64_t)(uintptr_t)failure_addr,
+   pdev->device.name);
+   return pdev;
+   }
+   }
+   }
+   return NULL;
+}
+
 static int
 pci_hotplug_failure_handler(struct rte_device *dev)
 {
@@ -435,6 +461,28 @@ pci_hotplug_failure_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+   struct rte_pci_device *pdev = NULL;
+   int ret = 0;
+
+   pdev = pci_find_device_by_addr(failure_addr);
+   if (!pdev) {
+   /* It is a generic sigbus error, no bus would handle it. */
+   ret = 1;
+   } else {
+   /* The sigbus error is caused of hot removal. */
+   ret = pci_hotplug_failure_handler(&pdev->device);
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Failed to handle hot plug for "
+   "device %s", pdev->name);
+   ret = -1;
+   }
+   }
+   return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -465,6 +513,7 @@ struct rte_pci_bus rte_pci_bus = {
.parse = pci_parse,
.get_iommu_class = rte_pci_get_iommu_class,
.hotplug_failure_handler = pci_hotplug_failure_handler,
+   .sigbus_handler = pci_sigbus_handler,
},
.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4



[dpdk-dev] [PATCH v7 5/7] bus: add helper to handle sigbus

2018-07-09 Thread Jeff Guo
This patch aim to add a helper to iterate all buses to find the
corresponding bus to handle the sigbus error.

Signed-off-by: Jeff Guo 
---
 lib/librte_eal/common/eal_common_bus.c | 42 ++
 lib/librte_eal/common/eal_private.h| 12 ++
 2 files changed, 54 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c 
b/lib/librte_eal/common/eal_common_bus.c
index 0943851..8856adc 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "eal_private.h"
 
@@ -242,3 +243,44 @@ rte_bus_get_iommu_class(void)
}
return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+   const void *failure_addr)
+{
+   int ret;
+
+   if (!bus->sigbus_handler) {
+   RTE_LOG(ERR, EAL, "Function sigbus_handler not supported by "
+   "bus (%s)\n", bus->name);
+   return -1;
+   }
+
+   ret = bus->sigbus_handler(failure_addr);
+   rte_errno = ret;
+
+   return !(bus->sigbus_handler && ret <= 0);
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+   struct rte_bus *bus;
+
+   int ret = 0;
+   int old_errno = rte_errno;
+
+   rte_errno = 0;
+
+   bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+   /* failed to handle the sigbus, pass the new errno. */
+   if (!bus)
+   ret = 1;
+   else if (rte_errno == -1)
+   return -1;
+
+   /* otherwise restore the old errno. */
+   rte_errno = old_errno;
+
+   return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index bdadc4d..2337e71 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
  */
 void dev_callback_process(char *device_name, enum rte_dev_event_type event);
 
+/**
+ * Iterate all buses to find the corresponding bus, to handle the sigbus error.
+ * @param failure_addr
+ * Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *  0 success to handle the sigbus.
+ * -1 failed to handle the sigbus
+ *  1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4



[dpdk-dev] [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug

2018-07-09 Thread Jeff Guo
When hotplug out device, the kernel will release the device resource in the
kernel side, such as the fd sys file will disappear, and the irq will be
released. At this time, if igb uio driver still try to release this
resource, it will cause kernel crash. On the other hand, something like
interrupt disabling do not automatically process in kernel side. If not
handler it, this redundancy and dirty thing will affect the interrupt
resource be used by other device. So the igb_uio driver have to check the
hotplug status, and the corresponding process should be taken in igb uio
driver.

This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
of igb_uio kernel driver, which will record the state of uio device, such
as probed/opened/released/removed/unplug. When detect the unexpected
removal which cause of hotplug out behavior, it will corresponding disable
interrupt resource, while for the part of releasement which kernel have
already handle, just skip it to avoid double free or null pointer kernel
crash issue.

Signed-off-by: Jeff Guo 
---
 kernel/linux/igb_uio/igb_uio.c | 51 +++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index 3398eac..adc8cea 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,15 @@
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+   RTE_UDEV_PROBED,
+   RTE_UDEV_OPENNED,
+   RTE_UDEV_RELEASED,
+   RTE_UDEV_REMOVED,
+   RTE_UDEV_UNPLUG
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +37,7 @@ struct rte_uio_pci_dev {
enum rte_intr_mode mode;
struct mutex lock;
int refcnt;
+   enum rte_udev_state state;
 };
 
 static int wc_activate;
@@ -195,12 +205,22 @@ igbuio_pci_irqhandler(int irq, void *dev_id)
 {
struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
struct uio_info *info = &udev->info;
+   struct pci_dev *pdev = udev->pdev;
 
/* Legacy mode need to mask in hardware */
if (udev->mode == RTE_INTR_MODE_LEGACY &&
!pci_check_and_mask_intx(udev->pdev))
return IRQ_NONE;
 
+   mutex_lock(&udev->lock);
+   /* check the uevent of the kobj */
+   if ((&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+   dev_notice(&pdev->dev, "device:%s, sent remove uevent!\n",
+  (&pdev->dev.kobj)->name);
+   udev->state = RTE_UDEV_UNPLUG;
+   }
+   mutex_unlock(&udev->lock);
+
uio_event_notify(info);
 
/* Message signal mode, no share IRQ and automasked */
@@ -309,7 +329,6 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
-
 /**
  * This gets called while opening uio device file.
  */
@@ -331,20 +350,29 @@ igbuio_pci_open(struct uio_info *info, struct inode 
*inode)
 
/* enable interrupts */
err = igbuio_pci_enable_interrupts(udev);
-   mutex_unlock(&udev->lock);
if (err) {
dev_err(&dev->dev, "Enable interrupt fails\n");
+   pci_clear_master(dev);
+   mutex_unlock(&udev->lock);
return err;
}
+   udev->state = RTE_UDEV_OPENNED;
+   mutex_unlock(&udev->lock);
return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
struct rte_uio_pci_dev *udev = info->priv;
struct pci_dev *dev = udev->pdev;
 
+   if (udev->state == RTE_UDEV_REMOVED)
+   return 0;
+
mutex_lock(&udev->lock);
if (--udev->refcnt > 0) {
mutex_unlock(&udev->lock);
@@ -356,7 +384,7 @@ igbuio_pci_release(struct uio_info *info, struct inode 
*inode)
 
/* stop the device from further DMA */
pci_clear_master(dev);
-
+   udev->state = RTE_UDEV_RELEASED;
mutex_unlock(&udev->lock);
return 0;
 }
@@ -562,6 +590,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
 (unsigned long long)map_dma_addr, map_addr);
}
 
+   mutex_lock(&udev->lock);
+   udev->state = RTE_UDEV_PROBED;
+   mutex_unlock(&udev->lock);
return 0;
 
 fail_remove_group:
@@ -579,6 +610,20 @@ static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+   int ret;
+
+   /* handler hot unplug */
+   if (udev->state == RTE_UDEV_OPENNED ||
+   udev->state == RTE_UDEV_UNPLUG) {
+   dev_notice(&dev->dev, "Unexpected removal!\n");
+   ret = igbuio_pci_release(&udev->info, NULL);
+   if (ret)
+   return;
+   mutex_lock(&udev->lock);
+   udev->state = RTE_UDEV_REMOVED;
+   mutex_unl

[dpdk-dev] [PATCH v7 6/7] eal: add failure handle mechanism for hotplug

2018-07-09 Thread Jeff Guo
This patch introduces a failure handle mechanism to handle device
hotplug removal event.

First it can register sigbus handler when enable device event monitor. Once
sigbus error be captured, it will check the failure address and accordingly
remap the invalid memory for the corresponding device. Besed on this
mechanism, it could guaranty the application not crash when the device be
hotplug out.

Signed-off-by: Jeff Guo 
---
v7->v6:
delete some unused part.
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 112 +-
 1 file changed, 111 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..0de3fb7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -14,6 +16,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include "eal_private.h"
 
@@ -23,6 +29,16 @@ static bool monitor_started;
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device failure process, protect the bus and the device
+ * to avoid race condition.
+ */
+static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +49,49 @@ enum eal_dev_event_subsystem {
EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+   if (sigbus_need_recover) {
+   sigaction(SIGBUS, &sigbus_action_old, NULL);
+   sigbus_need_recover = 0;
+   }
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+   void *ctx __rte_unused)
+{
+   int ret;
+
+   RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+   (int)pthread_self(), info->si_addr);
+
+   rte_spinlock_lock(&dev_failure_lock);
+   ret = rte_bus_sigbus_handler(info->si_addr);
+   rte_spinlock_unlock(&dev_failure_lock);
+   if (ret == -1) {
+   rte_exit(EXIT_FAILURE,
+"Failed to handle SIGBUS for hotplug, "
+"(rte_errno: %s)!", strerror(rte_errno));
+   } else if (ret == 1) {
+   if (sigbus_action_old.sa_handler)
+   (*(sigbus_action_old.sa_handler))(signum);
+   else
+   rte_exit(EXIT_FAILURE,
+"Failed to handle generic SIGBUS!");
+   }
+
+   RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hotplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+   const void *_name)
+{
+   const char *name = _name;
+
+   return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +206,9 @@ dev_uev_handler(__rte_unused void *param)
struct rte_dev_event uevent;
int ret;
char buf[EAL_UEV_MSG_LEN];
+   struct rte_bus *bus;
+   struct rte_device *dev;
+   const char *busname = "";
 
memset(&uevent, 0, sizeof(struct rte_dev_event));
memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +233,50 @@ dev_uev_handler(__rte_unused void *param)
RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
uevent.devname, uevent.type, uevent.subsystem);
 
-   if (uevent.devname)
+   switch (uevent.subsystem) {
+   case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+   case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+   busname = "pci";
+   break;
+   default:
+   break;
+   }
+
+   if (uevent.devname) {
+   if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+   rte_spinlock_lock(&dev_failure_lock);
+   bus = rte_bus_find_by_name(busname);
+   if (bus == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+   busname);
+   return;
+   }
+
+   dev = bus->find_device(NULL, cmp_dev_name,
+  uevent.devname);
+   if (dev == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+   "bus (%s)\n", uevent.devname, busname);
+   return;
+   }
+
+   ret = bus->hotplug_failure_handler(dev);
+   rte_spinlock_unlock(&dev_failure_lock);
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Can not handle hotplug for "
+   "device (%s)\n", dev->name);
+

[dpdk-dev] [PATCH v7 2/7] bus/pci: implement hotplug failure handler ops

2018-07-09 Thread Jeff Guo
This patch implements the ops of hotplug failure handler for PCI bus,
it is functional to remap a new dummy memory which overlap to the
failure memory to avoid MMIO read/write error.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v7->v6:
no change
---
 drivers/bus/pci/pci_common.c | 28 
 drivers/bus/pci/pci_common_uio.c | 33 +
 drivers/bus/pci/private.h| 12 
 3 files changed, 73 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 94b0f41..d7abe6c 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -408,6 +408,33 @@ pci_find_device(const struct rte_device *start, 
rte_dev_cmp_t cmp,
 }
 
 static int
+pci_hotplug_failure_handler(struct rte_device *dev)
+{
+   struct rte_pci_device *pdev = NULL;
+   int ret = 0;
+
+   pdev = RTE_DEV_TO_PCI(dev);
+   if (!pdev)
+   return -1;
+
+   switch (pdev->kdrv) {
+   case RTE_KDRV_IGB_UIO:
+   case RTE_KDRV_UIO_GENERIC:
+   case RTE_KDRV_NIC_UIO:
+   /* mmio resource is invalid, remap it to be safe. */
+   ret = pci_uio_remap_resource(pdev);
+   break;
+   default:
+   RTE_LOG(DEBUG, EAL,
+   "Not managed by a supported kernel driver, skipped\n");
+   ret = -1;
+   break;
+   }
+
+   return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -437,6 +464,7 @@ struct rte_pci_bus rte_pci_bus = {
.unplug = pci_unplug,
.parse = pci_parse,
.get_iommu_class = rte_pci_get_iommu_class,
+   .hotplug_failure_handler = pci_hotplug_failure_handler,
},
.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c
index 54bc20b..7ea73db 100644
--- a/drivers/bus/pci/pci_common_uio.c
+++ b/drivers/bus/pci/pci_common_uio.c
@@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res)
}
 }
 
+/* remap the PCI resource of a PCI device in anonymous virtual memory */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev)
+{
+   int i;
+   void *map_address;
+
+   if (dev == NULL)
+   return -1;
+
+   /* Remap all BARs */
+   for (i = 0; i != PCI_MAX_RESOURCE; i++) {
+   /* skip empty BAR */
+   if (dev->mem_resource[i].phys_addr == 0)
+   continue;
+   map_address = mmap(dev->mem_resource[i].addr,
+   (size_t)dev->mem_resource[i].len,
+   PROT_READ | PROT_WRITE,
+   MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+   if (map_address == MAP_FAILED) {
+   RTE_LOG(ERR, EAL,
+   "Cannot remap resource for device %s\n",
+   dev->name);
+   return -1;
+   }
+   RTE_LOG(INFO, EAL,
+   "Successful remap resource for device %s\n",
+   dev->name);
+   }
+
+   return 0;
+}
+
 static struct mapped_pci_resource *
 pci_uio_find_resource(struct rte_pci_device *dev)
 {
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 8ddd03e..6b312e5 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev,
struct mapped_pci_resource *uio_res);
 
 /**
+ * Remap the PCI resource of a PCI device in anonymous virtual memory.
+ *
+ * @param dev
+ *   Point to the struct rte pci device.
+ * @return
+ *   - On success, zero.
+ *   - On failure, a negative value.
+ */
+int
+pci_uio_remap_resource(struct rte_pci_device *dev);
+
+/**
  * Map device memory to uio resource
  *
  * This function is private to EAL.
-- 
2.7.4



[dpdk-dev] [PATCH v7 0/7] hotplug failure handle mechanism

2018-07-09 Thread Jeff Guo
As we know, hot plug is an importance feature, either use for the datacenter
device’s fail-safe, or use for SRIOV Live Migration in SDN/NFV. It could bring
the higher flexibility and continuality to the networking services in multiple
use cases in industry. So let we see, dpdk as an importance networking
framework, what can it help to implement hot plug solution for users.

We already have a general device event detect mechanism, failsafe driver,
bonding driver and hot plug/unplug api in framework, app could use these to
develop their hot plug solution.

let’s see the case of hot unplug, it can happen when a hardware device is
be removed physically, or when the software disables it.  App need to call
ether dev API to detach the device, to unplug the device at the bus level and
make access to the device invalid. But the problem is that, the removal of the
device from the software lists is not going to be instantaneous, at this time
if the data(fast) path still read/write the device, it will cause MMIO error
and result of the app crash out.

Seems that we have got fail-safe driver(or app) + RTE_ETH_EVENT_INTR_RMV +
kernel core driver solution to handle it, but still not have failsafe driver
(or app) + RTE_DEV_EVENT_REMOVE + PCIe pmd driver failure handle solution. So
there is an absence in dpdk hot plug solution right now.

Also, we know that kernel only guaranty hot plug on the kernel side, but not for
the user mode side. Firstly we can hardly have a gatekeeper for any MMIO for
multiple PMD driver. Secondly, no more specific 3rd tools such as udev/driverctl
have especially cover these hot plug failure processing. Third, the feasibility
of app’s implement for multiple user mode PMD driver is still a problem. Here,
a general hot plug failure handle mechanism in dpdk framework would be proposed,
it aim to guaranty that, when hot unplug occur, the system will not crash and
app will not be break out, and user space can normally stop and release any
relevant resources, then unplug of the device at the bus level cleanly.

The mechanism should be come across as bellow:

Firstly, app enabled the device event monitor and register the hot plug event’s
callback before running data path. Once the hot unplug behave occur, the
mechanism will detect the removal event and then accordingly do the failure
handle. In order to do that, below functional will be bring in.
 - Add a new bus ops “handle_hot_unplug” to handle bus read/write error, it is
   bus-specific and each kind of bus can implement its own logic.
 - Implement pci bus specific ops “pci_handle_hot_unplug”. It will base on the
   failure address to remap memory for the corresponding device that unplugged.

For the data path or other unexpected control from the control path when hot
unplug occur.
 - Implement a new sigbus handler, it is registered when start device even
   monitoring. The handler is per process. Base on the signal event principle,
   control path thread and data path thread will randomly receive the sigbus
   error, but will go to the common sigbus handler. Once the MMIO sigbus error
   exposure, it will trigger the above hot unplug operation. The sigbus will be
   check if it is cause of the hot unplug or not, if not will info exception as
   the original sigbus handler. If yes, will do memory remapping.

For the control path and the igb uio release:
 - When hot unplug device, the kernel will release the device resource in the
   kernel side, such as the fd sys file will disappear, and the irq will be
   released. At this time, if igb uio driver still try to release this resource,
   it will cause kernel crash.
   On the other hand, something like interrupt disable do not automatically
   process in kernel side. If not handler it, this redundancy and dirty thing
   will affect the interrupt resource be used by other device.
   So the igb_uio driver have to check the hot plug status and corresponding
   process should be taken in igb uio deriver.
   This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
   of igb_uio kernel driver, which will record the state of uio device, such as
   probed/opened/released/removed/unplug. When detect the unexpected removal
   which cause of hot unplug behavior, it will corresponding disable interrupt
   resource, while for the part of releasement which kernel have already handle,
   just skip it to avoid double free or null pointer kernel crash issue.

The mechanism could be use for fail-safe driver and app which want to use hot
plug solution. let testpmd for example:
 - Enable device event monitor->device unplug->failure handle->stop forwarding->
   stop port->close port->detach port.

This process will not breaking the app/fail-safe running, and will not break
other irrelevance device. And app could plug in the device and restart the date
path again by below.
 - Device plug in->bind igb_uio driver ->attached device->start port->
   start forwarding.

patchset history:
v7->v6:
delete some unuse

[dpdk-dev] [PATCH v7 1/7] bus: add hotplug failure handler

2018-07-09 Thread Jeff Guo
When device be hotplug out, if app still continue to access device by mmio,
it will cause of memory failure and result the system crash.

This patch introduces a bus ops to handle device hotplug failure, it is a
bus specific behavior, so each kind of bus can implement its own logic case
by case.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v7->v6:
no change
---
 lib/librte_eal/common/include/rte_bus.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h 
b/lib/librte_eal/common/include/rte_bus.h
index eb9eded..e3a55a8 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,20 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Implementation a specific hotplug failure handler, which is responsible
+ * for handle the failure when the device be hotplug out from the bus. When
+ * hotplug removal event be detected, it could call this function to handle
+ * failure and guaranty the system would not crash in the case.
+ * @param dev
+ * Pointer of the device structure.
+ *
+ * @return
+ * 0 on success.
+ * !0 on error.
+ */
+typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -211,6 +225,8 @@ struct rte_bus {
rte_bus_parse_t parse;   /**< Parse a device name */
struct rte_bus_conf conf;/**< Bus configuration */
rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
+   rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
+   /**< handle hotplug failure on bus */
 };
 
 /**
-- 
2.7.4



[dpdk-dev] [PATCH v7 3/7] bus: add sigbus handler

2018-07-09 Thread Jeff Guo
When device be hotplug out, if data path still read/write device, the
sigbus error will occur, this error need to be handled. So a handler
need to be here to capture the signal and handle it correspondingly.

This patch introduces a bus ops to handle sigbus error, it is a bus
specific behavior, so that each kind of bus can implement its own logic
case by case.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v7->v6:
no change
---
 lib/librte_eal/common/include/rte_bus.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_bus.h 
b/lib/librte_eal/common/include/rte_bus.h
index e3a55a8..216ad1e 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -182,6 +182,21 @@ typedef int (*rte_bus_parse_t)(const char *name, void 
*addr);
 typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev);
 
 /**
+ * Implementation a specific sigbus handler, which is responsible for handle
+ * the sigbus error which is either original memory error, or specific memory
+ * error that caused of hot unplug. When sigbus error be captured, it could
+ * call this function to handle sigbus error.
+ * @param failure_addr
+ * Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ * 0 for success handle the sigbus.
+ * 1 for no bus handle the sigbus.
+ * -1 for failed to handle the sigbus
+ */
+typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
+
+/**
  * Bus scan policies
  */
 enum rte_bus_scan_mode {
@@ -227,6 +242,8 @@ struct rte_bus {
rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
rte_bus_hotplug_failure_handler_t hotplug_failure_handler;
/**< handle hotplug failure on bus */
+   rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */
+
 };
 
 /**
-- 
2.7.4



[dpdk-dev] [PATCH v7 4/7] bus/pci: implement sigbus handler operation

2018-07-09 Thread Jeff Guo
This patch implements the ops of sigbus handler for PCI bus, it is
functional to find the corresponding pci device which is been hotplug
out, and then call the bus ops of hotplug failure handler to handle
the failure for the device.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v7->v6:
no change
---
 drivers/bus/pci/pci_common.c | 49 
 1 file changed, 49 insertions(+)

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index d7abe6c..37ad266 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -407,6 +407,32 @@ pci_find_device(const struct rte_device *start, 
rte_dev_cmp_t cmp,
return NULL;
 }
 
+/* check the failure address belongs to which device. */
+static struct rte_pci_device *
+pci_find_device_by_addr(const void *failure_addr)
+{
+   struct rte_pci_device *pdev = NULL;
+   int i;
+
+   FOREACH_DEVICE_ON_PCIBUS(pdev) {
+   for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) {
+   if ((uint64_t)(uintptr_t)failure_addr >=
+   (uint64_t)(uintptr_t)pdev->mem_resource[i].addr &&
+   (uint64_t)(uintptr_t)failure_addr <
+   (uint64_t)(uintptr_t)pdev->mem_resource[i].addr +
+   pdev->mem_resource[i].len) {
+   RTE_LOG(INFO, EAL, "Failure address "
+   "%16.16"PRIx64" belongs to "
+   "device %s!\n",
+   (uint64_t)(uintptr_t)failure_addr,
+   pdev->device.name);
+   return pdev;
+   }
+   }
+   }
+   return NULL;
+}
+
 static int
 pci_hotplug_failure_handler(struct rte_device *dev)
 {
@@ -435,6 +461,28 @@ pci_hotplug_failure_handler(struct rte_device *dev)
 }
 
 static int
+pci_sigbus_handler(const void *failure_addr)
+{
+   struct rte_pci_device *pdev = NULL;
+   int ret = 0;
+
+   pdev = pci_find_device_by_addr(failure_addr);
+   if (!pdev) {
+   /* It is a generic sigbus error, no bus would handle it. */
+   ret = 1;
+   } else {
+   /* The sigbus error is caused of hot removal. */
+   ret = pci_hotplug_failure_handler(&pdev->device);
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Failed to handle hot plug for "
+   "device %s", pdev->name);
+   ret = -1;
+   }
+   }
+   return ret;
+}
+
+static int
 pci_plug(struct rte_device *dev)
 {
return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev));
@@ -465,6 +513,7 @@ struct rte_pci_bus rte_pci_bus = {
.parse = pci_parse,
.get_iommu_class = rte_pci_get_iommu_class,
.hotplug_failure_handler = pci_hotplug_failure_handler,
+   .sigbus_handler = pci_sigbus_handler,
},
.device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list),
.driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list),
-- 
2.7.4



[dpdk-dev] [PATCH v7 5/7] bus: add helper to handle sigbus

2018-07-09 Thread Jeff Guo
This patch aim to add a helper to iterate all buses to find the
corresponding bus to handle the sigbus error.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v7->v6:
no change
---
 lib/librte_eal/common/eal_common_bus.c | 42 ++
 lib/librte_eal/common/eal_private.h| 12 ++
 2 files changed, 54 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c 
b/lib/librte_eal/common/eal_common_bus.c
index 0943851..8856adc 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "eal_private.h"
 
@@ -242,3 +243,44 @@ rte_bus_get_iommu_class(void)
}
return mode;
 }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+   const void *failure_addr)
+{
+   int ret;
+
+   if (!bus->sigbus_handler) {
+   RTE_LOG(ERR, EAL, "Function sigbus_handler not supported by "
+   "bus (%s)\n", bus->name);
+   return -1;
+   }
+
+   ret = bus->sigbus_handler(failure_addr);
+   rte_errno = ret;
+
+   return !(bus->sigbus_handler && ret <= 0);
+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+   struct rte_bus *bus;
+
+   int ret = 0;
+   int old_errno = rte_errno;
+
+   rte_errno = 0;
+
+   bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+   /* failed to handle the sigbus, pass the new errno. */
+   if (!bus)
+   ret = 1;
+   else if (rte_errno == -1)
+   return -1;
+
+   /* otherwise restore the old errno. */
+   rte_errno = old_errno;
+
+   return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index bdadc4d..2337e71 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
  */
 void dev_callback_process(char *device_name, enum rte_dev_event_type event);
 
+/**
+ * Iterate all buses to find the corresponding bus, to handle the sigbus error.
+ * @param failure_addr
+ * Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *  0 success to handle the sigbus.
+ * -1 failed to handle the sigbus
+ *  1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
 #endif /* _EAL_PRIVATE_H_ */
-- 
2.7.4



[dpdk-dev] [PATCH v7 6/7] eal: add failure handle mechanism for hotplug

2018-07-09 Thread Jeff Guo
This patch introduces a failure handle mechanism to handle device
hotplug removal event.

First it can register sigbus handler when enable device event monitor. Once
sigbus error be captured, it will check the failure address and accordingly
remap the invalid memory for the corresponding device. Besed on this
mechanism, it could guaranty the application not crash when the device be
hotplug out.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v7->v6:
delete some unused part.
---
 lib/librte_eal/linuxapp/eal/eal_dev.c | 112 +-
 1 file changed, 111 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..0de3fb7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -14,6 +16,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include "eal_private.h"
 
@@ -23,6 +29,16 @@ static bool monitor_started;
 #define EAL_UEV_MSG_LEN 4096
 #define EAL_UEV_MSG_ELEM_LEN 128
 
+/*
+ * spinlock for device failure process, protect the bus and the device
+ * to avoid race condition.
+ */
+static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;
+
+static struct sigaction sigbus_action_old;
+
+static int sigbus_need_recover;
+
 static void dev_uev_handler(__rte_unused void *param);
 
 /* identify the system layer which reports this event. */
@@ -33,6 +49,49 @@ enum eal_dev_event_subsystem {
EAL_DEV_EVENT_SUBSYSTEM_MAX
 };
 
+static void
+sigbus_action_recover(void)
+{
+   if (sigbus_need_recover) {
+   sigaction(SIGBUS, &sigbus_action_old, NULL);
+   sigbus_need_recover = 0;
+   }
+}
+
+static void sigbus_handler(int signum, siginfo_t *info,
+   void *ctx __rte_unused)
+{
+   int ret;
+
+   RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n",
+   (int)pthread_self(), info->si_addr);
+
+   rte_spinlock_lock(&dev_failure_lock);
+   ret = rte_bus_sigbus_handler(info->si_addr);
+   rte_spinlock_unlock(&dev_failure_lock);
+   if (ret == -1) {
+   rte_exit(EXIT_FAILURE,
+"Failed to handle SIGBUS for hotplug, "
+"(rte_errno: %s)!", strerror(rte_errno));
+   } else if (ret == 1) {
+   if (sigbus_action_old.sa_handler)
+   (*(sigbus_action_old.sa_handler))(signum);
+   else
+   rte_exit(EXIT_FAILURE,
+"Failed to handle generic SIGBUS!");
+   }
+
+   RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hotplug!\n");
+}
+
+static int cmp_dev_name(const struct rte_device *dev,
+   const void *_name)
+{
+   const char *name = _name;
+
+   return strcmp(dev->name, name);
+}
+
 static int
 dev_uev_socket_fd_create(void)
 {
@@ -147,6 +206,9 @@ dev_uev_handler(__rte_unused void *param)
struct rte_dev_event uevent;
int ret;
char buf[EAL_UEV_MSG_LEN];
+   struct rte_bus *bus;
+   struct rte_device *dev;
+   const char *busname = "";
 
memset(&uevent, 0, sizeof(struct rte_dev_event));
memset(buf, 0, EAL_UEV_MSG_LEN);
@@ -171,13 +233,50 @@ dev_uev_handler(__rte_unused void *param)
RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n",
uevent.devname, uevent.type, uevent.subsystem);
 
-   if (uevent.devname)
+   switch (uevent.subsystem) {
+   case EAL_DEV_EVENT_SUBSYSTEM_PCI:
+   case EAL_DEV_EVENT_SUBSYSTEM_UIO:
+   busname = "pci";
+   break;
+   default:
+   break;
+   }
+
+   if (uevent.devname) {
+   if (uevent.type == RTE_DEV_EVENT_REMOVE) {
+   rte_spinlock_lock(&dev_failure_lock);
+   bus = rte_bus_find_by_name(busname);
+   if (bus == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n",
+   busname);
+   return;
+   }
+
+   dev = bus->find_device(NULL, cmp_dev_name,
+  uevent.devname);
+   if (dev == NULL) {
+   RTE_LOG(ERR, EAL, "Cannot find device (%s) on "
+   "bus (%s)\n", uevent.devname, busname);
+   return;
+   }
+
+   ret = bus->hotplug_failure_handler(dev);
+   rte_spinlock_unlock(&dev_failure_lock);
+   if (ret) {
+   RTE_LOG(ERR, EAL, "Can not handle hotplug for "
+   "device (%s)\n", dev->name);
+ 

[dpdk-dev] [PATCH v7 7/7] igb_uio: fix uio release issue when hot unplug

2018-07-09 Thread Jeff Guo
When hotplug out device, the kernel will release the device resource in the
kernel side, such as the fd sys file will disappear, and the irq will be
released. At this time, if igb uio driver still try to release this
resource, it will cause kernel crash. On the other hand, something like
interrupt disabling do not automatically process in kernel side. If not
handler it, this redundancy and dirty thing will affect the interrupt
resource be used by other device. So the igb_uio driver have to check the
hotplug status, and the corresponding process should be taken in igb uio
driver.

This patch propose to add structure of rte_udev_state into rte_uio_pci_dev
of igb_uio kernel driver, which will record the state of uio device, such
as probed/opened/released/removed/unplug. When detect the unexpected
removal which cause of hotplug out behavior, it will corresponding disable
interrupt resource, while for the part of releasement which kernel have
already handle, just skip it to avoid double free or null pointer kernel
crash issue.

Signed-off-by: Jeff Guo 
---
 kernel/linux/igb_uio/igb_uio.c | 51 +++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index 3398eac..adc8cea 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,15 @@
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+   RTE_UDEV_PROBED,
+   RTE_UDEV_OPENNED,
+   RTE_UDEV_RELEASED,
+   RTE_UDEV_REMOVED,
+   RTE_UDEV_UNPLUG
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +37,7 @@ struct rte_uio_pci_dev {
enum rte_intr_mode mode;
struct mutex lock;
int refcnt;
+   enum rte_udev_state state;
 };
 
 static int wc_activate;
@@ -195,12 +205,22 @@ igbuio_pci_irqhandler(int irq, void *dev_id)
 {
struct rte_uio_pci_dev *udev = (struct rte_uio_pci_dev *)dev_id;
struct uio_info *info = &udev->info;
+   struct pci_dev *pdev = udev->pdev;
 
/* Legacy mode need to mask in hardware */
if (udev->mode == RTE_INTR_MODE_LEGACY &&
!pci_check_and_mask_intx(udev->pdev))
return IRQ_NONE;
 
+   mutex_lock(&udev->lock);
+   /* check the uevent of the kobj */
+   if ((&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+   dev_notice(&pdev->dev, "device:%s, sent remove uevent!\n",
+  (&pdev->dev.kobj)->name);
+   udev->state = RTE_UDEV_UNPLUG;
+   }
+   mutex_unlock(&udev->lock);
+
uio_event_notify(info);
 
/* Message signal mode, no share IRQ and automasked */
@@ -309,7 +329,6 @@ igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
-
 /**
  * This gets called while opening uio device file.
  */
@@ -331,20 +350,29 @@ igbuio_pci_open(struct uio_info *info, struct inode 
*inode)
 
/* enable interrupts */
err = igbuio_pci_enable_interrupts(udev);
-   mutex_unlock(&udev->lock);
if (err) {
dev_err(&dev->dev, "Enable interrupt fails\n");
+   pci_clear_master(dev);
+   mutex_unlock(&udev->lock);
return err;
}
+   udev->state = RTE_UDEV_OPENNED;
+   mutex_unlock(&udev->lock);
return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
struct rte_uio_pci_dev *udev = info->priv;
struct pci_dev *dev = udev->pdev;
 
+   if (udev->state == RTE_UDEV_REMOVED)
+   return 0;
+
mutex_lock(&udev->lock);
if (--udev->refcnt > 0) {
mutex_unlock(&udev->lock);
@@ -356,7 +384,7 @@ igbuio_pci_release(struct uio_info *info, struct inode 
*inode)
 
/* stop the device from further DMA */
pci_clear_master(dev);
-
+   udev->state = RTE_UDEV_RELEASED;
mutex_unlock(&udev->lock);
return 0;
 }
@@ -562,6 +590,9 @@ igbuio_pci_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
 (unsigned long long)map_dma_addr, map_addr);
}
 
+   mutex_lock(&udev->lock);
+   udev->state = RTE_UDEV_PROBED;
+   mutex_unlock(&udev->lock);
return 0;
 
 fail_remove_group:
@@ -579,6 +610,20 @@ static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+   int ret;
+
+   /* handler hot unplug */
+   if (udev->state == RTE_UDEV_OPENNED ||
+   udev->state == RTE_UDEV_UNPLUG) {
+   dev_notice(&dev->dev, "Unexpected removal!\n");
+   ret = igbuio_pci_release(&udev->info, NULL);
+   if (ret)
+   return;
+   mutex_lock(&udev->lock);
+   udev->state = RTE_UDEV_REMOVED;
+   mutex_unl

Re: [dpdk-dev] [PATCH v2] librte_lpm: Improve performance of the delete and add functions

2018-07-09 Thread Alex Kiselev
>> + int ret = rte_hash_lookup_data(lpm->rules_tbl, (void *) &rule_key,
>> + (void **) &rule);
>> + if (ret >= 0) {
>> + /* delete the rule */
>> + rte_hash_del_key(lpm->rules_tbl, (void *) &rule_key);
>> + lpm->used_rules--;
>> + rte_mempool_put(lpm->rules_pool, rule);
>> + }

> Rather than doing a lookup and then delete, why not just try the delete
> straight off. If you want to check for the key not being present, it can be
> detected from the output of the delete call. From rte_hash.h:

>  * @return
>  *   - -EINVAL if the parameters are invalid.
>  *   - -ENOENT if the key is not found.

A deleted rule has to be returned back to the mempool.
And I don't see any delete function in the rte_hash that can
return a deleted item back to a caller. 

>> +
>> + return ret;
>>  }
>>  
>>  /*
>> - * Deletes a rule
>> + * Deletes a group of rules

> Include a comment that this bulk function will rebuild the lpm table,
> rather than doing incremental updates like the regular delete function.
ok


>> + * Convert a depth to a one byte long mask
>> + */
>> +static uint8_t __attribute__((pure))
>> +depth_to_mask_1b(uint8_t depth)
>> +{
>> + /* To calculate a mask start with a 1 on the left hand side and right
>> +  * shift while populating the left hand side with 1's
>>*/
>> - if ((lpm == NULL) || (ips == NULL) || (depths == NULL)) {
>> - return -EINVAL;
>> + return (signed char)0x80 >> (depth - 1);

> I'd make the comment on the function a little clearer e.g. using an
example: "4 =>> 0xF0", which should remove the need to have the second comment
> above the return statement.

> An alternative that might be a little clearer for the calculation would be:
"(uint8_t)(~(0xFF >>> depth))".

I've just copied this function from rte_lpm.c and converted it to 1byte version.
I'll add an example 4 =>> 0xF0.

>> +}
>> +
>> +/*
>> + * Find a less specific rule
>> + */
>> +static struct rte_lpm6_rule*
>> +rule_find_less_specific(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth)
>> +{
>> + if (depth == 1)
>> + return NULL;
>> +
>> + struct rte_lpm6_rule *rule;
>> + struct rte_lpm6_rule_key rule_key;
>> + rule_key_init(&rule_key, ip, depth);
>> + uint8_t mask;
>> +
>> + while (depth > 1) {
>> + depth--;
>> +
>> + /* each iteration zero one more bit of the key */
>> + mask = depth & 7; /* depth % 8 */
>> + if (mask > 0)
>> + mask = depth_to_mask_1b(mask);
>> +
>> + rule_key.depth = depth;
>> + rule_key.ip[depth >> 3] &= mask;
>> +

> It seems strange that when you adjust the depth, you also need to mask out
> bits of the key which should be ignored. Can you make the masking part of
> the hash calculation, which would simplify the logic here a lot, and if so,
> does it affect performance much?

The first version of rule_find_less_specific() was doing exactly what you are 
proposing,
masking whole ipv6 address every time. But then I just couldn't stop myself from
using this shortcut since it's a performance optimization patch.

So, yes, it could be a part of the hash calculation, but why? It's definetly not
the most difficult part of the algorithm (even without this optimizations), 
so it would not make life easier :)
  

>>  }
>> -- 
> Rest of the patch looks fine to me, though I can't say I've followed all
> the logic paths in full detail.

> Main concern I have about the patch is the size. Is there any way this
> patch could be split up into a few smaller ones with more gradual changes?
I could try to split it in two parts. The first part will introduce the new rule
subsystem using a hashtable instead of a flat array. And the second one will 
include
the rest. 

> Regards,
> /Bruce



-- 
Alex



Re: [dpdk-dev] [PATCH] net/i40e: fix PPPoL2TP packet type parser issue

2018-07-09 Thread Zhang, Qi Z



> -Original Message-
> From: Lin, Xueqin
> Sent: Thursday, July 5, 2018 2:33 PM
> To: Xing, Beilei ; Zhang, Qi Z 
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: FW: [dpdk-dev] [PATCH] net/i40e: fix PPPoL2TP packet type parser
> issue
> 
> 
> 
>  > -Original Message-
>  > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Beilei Xing  > Sent:
> Thursday, July 5, 2018 9:36 AM  > To: Zhang, Qi Z   >
> Cc: dev@dpdk.org; sta...@dpdk.org  > Subject: [dpdk-dev] [PATCH] net/i40e:
> fix PPPoL2TP packet type parser  > issue  >  > Since PPPoL2TP profile is
> updated, PPPoL2TP packet type parser will be  > false with the current parser
> function.
>  > This patch fixes the issue.
>  >
>  > Fixes: 11556c915a08 ("net/i40e: improve packet type parser")  > Cc:
> sta...@dpdk.org  >  

> Signed-off-by: Beilei Xing 
> Tested-by: Xueqin Lin 

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks!
Qi


> 



Re: [dpdk-dev] [PATCH] net/i40e: fix PPPoL2TP packet type parser issue

2018-07-09 Thread Zhang, Qi Z



> -Original Message-
> From: Lin, Xueqin
> Sent: Thursday, July 5, 2018 2:33 PM
> To: Xing, Beilei ; Zhang, Qi Z 
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: FW: [dpdk-dev] [PATCH] net/i40e: fix PPPoL2TP packet type parser
> issue
> 
> 
> 
>  > -Original Message-
>  > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Beilei Xing  > Sent:
> Thursday, July 5, 2018 9:36 AM  > To: Zhang, Qi Z   >
> Cc: dev@dpdk.org; sta...@dpdk.org  > Subject: [dpdk-dev] [PATCH] net/i40e:
> fix PPPoL2TP packet type parser  > issue  >  > Since PPPoL2TP profile is
> updated, PPPoL2TP packet type parser will be  > false with the current parser
> function.
>  > This patch fixes the issue.
>  >
>  > Fixes: 11556c915a08 ("net/i40e: improve packet type parser")  > Cc:
> sta...@dpdk.org  >  

> Signed-off-by: Beilei Xing 
> Tested-by: Xueqin Lin 

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks!
Qi
> 



Re: [dpdk-dev] [PATCH] net/i40e: fix fail to set TPID with AQ command

2018-07-09 Thread Zhang, Qi Z



> -Original Message-
> From: Xing, Beilei
> Sent: Thursday, July 5, 2018 3:03 PM
> To: Zhang, Qi Z 
> Cc: dev@dpdk.org; Wu, Jingjing ; Zheng, James
> ; sta...@dpdk.org
> Subject: [PATCH] net/i40e: fix fail to set TPID with AQ command
> 
> TPID can be set by set_switch_config AdminQ command on new FW release.
> But find fail to set 0x88A8 on some NICs.
> According to the datasheet, Switch Tag value should not be identical to either
> the First Tag or Second Tag values.
> So set something other than common Ethertype for internal switching.
> 
> Fixes: 73cd7d6dc8e1 ("net/i40e: use set switch AQ instead of register 
> setting")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Beilei Xing 

Acked-by: Qi Zhang 

Applied to dpdk-next-net-intel.

Thanks!
Qi


Re: [dpdk-dev] [PATCH v5] net/mlx4: support hardware TSO

2018-07-09 Thread Matan Azrad



Hi Moti 

Please see some comments below.

From: Mordechay Haimovsky
> Implement support for hardware TSO.
> 
> Signed-off-by: Moti Haimovsky 
> ---
> v5:
> * Modification to the code according to review inputs from Matan
>   Azrad.
> * Code optimization to the TSO header copy routine.
> * Rearranged the TSO data-segments creation routine.
> in reply to
> 1530715998-15703-1-git-send-email-mo...@mellanox.com
> 
> v4:
> * Bug fixes in filling TSO data segments.
> * Modifications according to review inputs from Adrien Mazarguil
>   and Matan Azrad.
> in reply to
> 1530190137-17848-1-git-send-email-mo...@mellanox.com
> 
> v3:
> * Fixed compilation errors in compilers without GNU C extensions
>   caused by a declaration of zero-length array in the code.
> in reply to
> 1530187032-6489-1-git-send-email-mo...@mellanox.com
> 
> v2:
> * Fixed coding style warning.
> in reply to
> 1530184583-30166-1-git-send-email-mo...@mellanox.com
> 
> v1:
> * Fixed coding style warnings.
> in reply to
> 1530181779-19716-1-git-send-email-mo...@mellanox.com
> ---
>  doc/guides/nics/features/mlx4.ini |   1 +
>  doc/guides/nics/mlx4.rst  |   3 +
>  drivers/net/mlx4/Makefile |   5 +
>  drivers/net/mlx4/mlx4.c   |   9 +
>  drivers/net/mlx4/mlx4.h   |   5 +
>  drivers/net/mlx4/mlx4_prm.h   |  15 ++
>  drivers/net/mlx4/mlx4_rxtx.c  | 372
> +-
>  drivers/net/mlx4/mlx4_rxtx.h  |   2 +-
>  drivers/net/mlx4/mlx4_txq.c   |   8 +-
>  9 files changed, 416 insertions(+), 4 deletions(-)
> 
> diff --git a/doc/guides/nics/features/mlx4.ini
> b/doc/guides/nics/features/mlx4.ini
> index f6efd21..98a3f61 100644
> --- a/doc/guides/nics/features/mlx4.ini
> +++ b/doc/guides/nics/features/mlx4.ini
> @@ -13,6 +13,7 @@ Queue start/stop = Y
>  MTU update   = Y
>  Jumbo frame  = Y
>  Scattered Rx = Y
> +TSO  = Y
>  Promiscuous mode = Y
>  Allmulticast mode= Y
>  Unicast MAC filter   = Y
> diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst index
> 491106a..12adaeb 100644
> --- a/doc/guides/nics/mlx4.rst
> +++ b/doc/guides/nics/mlx4.rst
> @@ -142,6 +142,9 @@ Limitations
>The ability to enable/disable CRC stripping requires OFED version
>4.3-1.5.0.0 and above  or rdma-core version v18 and above.
> 
> +- TSO (Transmit Segmentation Offload) is supported in OFED version
> +  4.4 and above or in rdma-core version v18 and above.
> +
>  Prerequisites
>  -
> 
> diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile index
> 73f9d40..63bc003 100644
> --- a/drivers/net/mlx4/Makefile
> +++ b/drivers/net/mlx4/Makefile
> @@ -85,6 +85,11 @@ mlx4_autoconf.h.new: FORCE
>  mlx4_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
>   $Q $(RM) -f -- '$@'
>   $Q : > '$@'
> + $Q sh -- '$<' '$@' \
> + HAVE_IBV_MLX4_WQE_LSO_SEG \
> + infiniband/mlx4dv.h \
> + type 'struct mlx4_wqe_lso_seg' \
> + $(AUTOCONF_OUTPUT)
> 
>  # Create mlx4_autoconf.h or update it in case it differs from the new one.
> 
> diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index
> d151a90..5d8c76d 100644
> --- a/drivers/net/mlx4/mlx4.c
> +++ b/drivers/net/mlx4/mlx4.c
> @@ -677,6 +677,15 @@ struct mlx4_conf {
> 
>   IBV_RAW_PACKET_CAP_SCATTER_FCS);
>   DEBUG("FCS stripping toggling is %ssupported",
> priv->hw_fcs_strip ? "" : "not ");
> + priv->tso =
> + ((device_attr_ex.tso_caps.max_tso > 0) &&
> +  (device_attr_ex.tso_caps.supported_qpts &
> +   (1 << IBV_QPT_RAW_PACKET)));
> + if (priv->tso)
> + priv->tso_max_payload_sz =
> + device_attr_ex.tso_caps.max_tso;
> + DEBUG("TSO is %ssupported",
> +   priv->tso ? "" : "not ");
>   /* Configure the first MAC address by default. */
>   err = mlx4_get_mac(priv, &mac.addr_bytes);
>   if (err) {
> diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index
> 300cb4d..89d8c38 100644
> --- a/drivers/net/mlx4/mlx4.h
> +++ b/drivers/net/mlx4/mlx4.h
> @@ -47,6 +47,9 @@
>  /** Interrupt alarm timeout value in microseconds. */  #define
> MLX4_INTR_ALARM_TIMEOUT 10
> 
> +/* Maximum packet headers size (L2+L3+L4) for TSO. */ #define
> +MLX4_MAX_TSO_HEADER 192
> +
>  /** Port parameter. */
>  #define MLX4_PMD_PORT_KVARG "port"
> 
> @@ -90,6 +93,8 @@ struct priv {
>   uint32_t hw_csum:1; /**< Checksum offload is supported. */
>   uint32_t hw_csum_l2tun:1; /**< Checksum support for L2 tunnels.
> */
>   uint32_t hw_fcs_strip:1; /**< FCS stripping toggling is supported. */
> + uint32_t tso:1; /**< Transmit segmentation offload is supported. */
> + uint32_t tso_max_payload_sz; /**< Max supported TSO payload
> size. */
>   uint64_t hw_rss_sup; /**< Supporte

Re: [dpdk-dev] [PATCH v2 13/20] net/mlx5: add RSS flow action

2018-07-09 Thread Nélio Laranjeiro
On Fri, Jul 06, 2018 at 05:35:22PM +, Yongseok Koh wrote:
> 
> > On Jul 6, 2018, at 8:59 AM, Nélio Laranjeiro  
> > wrote:
> > 
> > Hi Yongseok,
> > 
> > I am only addressing your questions concerns here, almost all other
> > points I also agree with them.
> > 
> > On Thu, Jul 05, 2018 at 07:16:35PM -0700, Yongseok Koh wrote:
> >> On Wed, Jun 27, 2018 at 05:07:45PM +0200, Nelio Laranjeiro wrote:
> >>> Signed-off-by: Nelio Laranjeiro 
> >>> ---
> >> [...]
> >> 
> >>> + */
> >>> +static void
> >>> +mlx5_flow_layers_update(struct rte_flow *flow, uint32_t layers)
> >>> +{
> >>> + if (flow->expand) {
> >>> + if (flow->cur_verbs)
> >>> + flow->cur_verbs->layers |= layers;
> >> 
> >> If flow->cur_verbs is null, does that mean it is a testing call? Then, is 
> >> it
> >> unnecessary to update layers for the testing call? Confusing..
> > 
> > No it may also happen if the buffer was too small, in any case the code
> > continues its validation.
> 
> Okay, understand. Thanks.
> But another question was, if it is a testing call (flow->cur_verbs is null) 
> with
> flow->expand being set, then no 'layers' isn't updated in this code. Is it 
> okay?

yes it was ok, after I've fixed the issue in the layers themselves,
again no layer position was done when the expanded was enabled.

Thanks,

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH v3 1/4] ethdev: Add eal device event callback

2018-07-09 Thread Andrew Rybchenko

On 09.07.2018 14:46, Jeff Guo wrote:

Implement a eal device event callback "rte_eth_dev_event_callback"
in ethdev, it could let pmd driver have chance to manage the eal
device event, such as process hotplug event.

 >

Signed-off-by: Jeff Guo 
---
v3->v2:
add new callback in ethdev
---
  doc/guides/rel_notes/release_18_08.rst |  8 
  lib/librte_ethdev/rte_ethdev.c | 37 ++
  lib/librte_ethdev/rte_ethdev_driver.h  | 20 ++
  3 files changed, 65 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_08.rst 
b/doc/guides/rel_notes/release_18_08.rst
index bc01242..2326058 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -46,6 +46,14 @@ New Features
Flow API support has been added to CXGBE Poll Mode Driver to offload
flows to Chelsio T5/T6 NICs.
  
+* **Added eal device event callback in ethdev for hotplug.**

+
+  Implement a eal device event callback in ethdev, it could let pmd driver


"pmd driver" sounds strange since PMD stands for poll-mode driver.


+  have chance to manage the eal device event, such as process hotplug event.
+
+  * ``rte_eth_dev_event_callback`` for driver use to register it and process
+eal device event.
+
  
  API Changes

  ---
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index a9977df..36f218a 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -4518,6 +4518,43 @@ rte_eth_devargs_parse(const char *dargs, struct 
rte_eth_devargs *eth_da)
return result;
  }
  
+void __rte_experimental

+rte_eth_dev_event_callback(char *device_name, enum rte_dev_event_type type,
+void *arg)
+{
+   struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)arg;
+
+   if (type >= RTE_DEV_EVENT_MAX) {
+   fprintf(stderr, "%s called upon invalid event %d\n",
+   __func__, type);
+   fflush(stderr);


I'd like to understand why fprintf() is used here for logging instead of 
rte_log

mechanisms.
Also if we really want the log, may be it make sense to move the if to 
default

case below.


+   }
+
+   switch (type) {
+   case RTE_DEV_EVENT_REMOVE:
+   ethdev_log(INFO, "The device: %s has been removed!\n",
+   device_name);
+
+   if (!device_name || !eth_dev)
+   return;
+
+   if (!(eth_dev->data->dev_flags & RTE_ETH_EVENT_INTR_RMV))
+   return;
+
+   if (!strcmp(device_name, eth_dev->device->name))


Do we really need to check it? The callback is registered for devices
with such name, so it should be always true. May be it is OK to double-check
I just want to be sure that I understand it properly.


+   _rte_eth_dev_callback_process(eth_dev,
+ RTE_ETH_EVENT_INTR_RMV,
+ NULL);
+   break;
+   case RTE_DEV_EVENT_ADD:
+   ethdev_log(INFO, "The device: %s has been added!\n",
+   device_name);
+   break;
+   default:
+   break;
+   }
+}
+
  RTE_INIT(ethdev_init_log);
  static void
  ethdev_init_log(void)
diff --git a/lib/librte_ethdev/rte_ethdev_driver.h 
b/lib/librte_ethdev/rte_ethdev_driver.h
index c9c825e..fed5afa 100644
--- a/lib/librte_ethdev/rte_ethdev_driver.h
+++ b/lib/librte_ethdev/rte_ethdev_driver.h
@@ -82,6 +82,26 @@ int rte_eth_dev_release_port(struct rte_eth_dev *eth_dev);
  void _rte_eth_dev_reset(struct rte_eth_dev *dev);
  
  /**

+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Implement a rte eth eal device event callbacks for the specific device.
+ *
+ * @param device_name
+ *  Pointer to the name of the rte device.


Is it name of the device which generates the event? If so, it should be 
highlighted.



+ * @param event
+ *  Eal device event type.
+ * @param ret_param
+ *  To pass data back to user application.
+ *
+ * @return
+ *  void
+ */
+void __rte_experimental
+rte_eth_dev_event_callback(char *device_name,
+   enum rte_dev_event_type event, void *cb_arg);
+
+/**
   * @internal Executes all the user application registered callbacks for
   * the specific device. It is for DPDK internal user only. User
   * application should not call it directly.




Re: [dpdk-dev] [PATCH v2] librte_lpm: Improve performance of the delete and add functions

2018-07-09 Thread Bruce Richardson
On Mon, Jul 09, 2018 at 03:33:44PM +0300, Alex Kiselev wrote:
> >> + int ret = rte_hash_lookup_data(lpm->rules_tbl, (void *) &rule_key,
> >> + (void **) &rule);
> >> + if (ret >= 0) {
> >> + /* delete the rule */
> >> + rte_hash_del_key(lpm->rules_tbl, (void *) &rule_key);
> >> + lpm->used_rules--;
> >> + rte_mempool_put(lpm->rules_pool, rule);
> >> + }
> 
> > Rather than doing a lookup and then delete, why not just try the delete
> > straight off. If you want to check for the key not being present, it can be
> > detected from the output of the delete call. From rte_hash.h:
> 
> >  * @return
> >  *   - -EINVAL if the parameters are invalid.
> >  *   - -ENOENT if the key is not found.
> 
> A deleted rule has to be returned back to the mempool.
> And I don't see any delete function in the rte_hash that can
> return a deleted item back to a caller. 
> 
Good point, never mind my comment, so.

> >> +
> >> + return ret;
> >>  }
> >>  
> >>  /*
> >> - * Deletes a rule
> >> + * Deletes a group of rules
> 
> > Include a comment that this bulk function will rebuild the lpm table,
> > rather than doing incremental updates like the regular delete function.
> ok
> 
> 
> >> + * Convert a depth to a one byte long mask
> >> + */
> >> +static uint8_t __attribute__((pure))
> >> +depth_to_mask_1b(uint8_t depth)
> >> +{
> >> + /* To calculate a mask start with a 1 on the left hand side and right
> >> +  * shift while populating the left hand side with 1's
> >>*/
> >> - if ((lpm == NULL) || (ips == NULL) || (depths == NULL)) {
> >> - return -EINVAL;
> >> + return (signed char)0x80 >> (depth - 1);
> 
> > I'd make the comment on the function a little clearer e.g. using an
> example: "4 =>> 0xF0", which should remove the need to have the second comment
> > above the return statement.
> 
> > An alternative that might be a little clearer for the calculation would be:
> "(uint8_t)(~(0xFF >>> depth))".
> 
> I've just copied this function from rte_lpm.c and converted it to 1byte 
> version.
> I'll add an example 4 =>> 0xF0.
> 
Ok. Keeping the code as-is is fine.

> >> +}
> >> +
> >> +/*
> >> + * Find a less specific rule
> >> + */
> >> +static struct rte_lpm6_rule*
> >> +rule_find_less_specific(struct rte_lpm6 *lpm, uint8_t *ip, uint8_t depth)
> >> +{
> >> + if (depth == 1)
> >> + return NULL;
> >> +
> >> + struct rte_lpm6_rule *rule;
> >> + struct rte_lpm6_rule_key rule_key;
> >> + rule_key_init(&rule_key, ip, depth);
> >> + uint8_t mask;
> >> +
> >> + while (depth > 1) {
> >> + depth--;
> >> +
> >> + /* each iteration zero one more bit of the key */
> >> + mask = depth & 7; /* depth % 8 */
> >> + if (mask > 0)
> >> + mask = depth_to_mask_1b(mask);
> >> +
> >> + rule_key.depth = depth;
> >> + rule_key.ip[depth >> 3] &= mask;
> >> +
> 
> > It seems strange that when you adjust the depth, you also need to mask out
> > bits of the key which should be ignored. Can you make the masking part of
> > the hash calculation, which would simplify the logic here a lot, and if so,
> > does it affect performance much?
> 
> The first version of rule_find_less_specific() was doing exactly what you are 
> proposing,
> masking whole ipv6 address every time. But then I just couldn't stop myself 
> from
> using this shortcut since it's a performance optimization patch.
> 
> So, yes, it could be a part of the hash calculation, but why? It's definetly 
> not
> the most difficult part of the algorithm (even without this optimizations), 
> so it would not make life easier :)
>   

Ok, makes sense.

> >>  }
> >> -- 
> > Rest of the patch looks fine to me, though I can't say I've followed all
> > the logic paths in full detail.
> 
> > Main concern I have about the patch is the size. Is there any way this
> > patch could be split up into a few smaller ones with more gradual changes?
> I could try to split it in two parts. The first part will introduce the new 
> rule
> subsystem using a hashtable instead of a flat array. And the second one will 
> include
> the rest. 
> 
Please attempt to do so, if possible, for the next version.

Thanks,
/Bruce


Re: [dpdk-dev] [PATCH v7 5/7] bus: add helper to handle sigbus

2018-07-09 Thread Andrew Rybchenko

On 09.07.2018 15:01, Jeff Guo wrote:

This patch aim to add a helper to iterate all buses to find the
corresponding bus to handle the sigbus error.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v7->v6:
no change
---
  lib/librte_eal/common/eal_common_bus.c | 42 ++
  lib/librte_eal/common/eal_private.h| 12 ++
  2 files changed, 54 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_bus.c 
b/lib/librte_eal/common/eal_common_bus.c
index 0943851..8856adc 100644
--- a/lib/librte_eal/common/eal_common_bus.c
+++ b/lib/librte_eal/common/eal_common_bus.c
@@ -37,6 +37,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "eal_private.h"
  
@@ -242,3 +243,44 @@ rte_bus_get_iommu_class(void)

}
return mode;
  }
+
+static int
+bus_handle_sigbus(const struct rte_bus *bus,
+   const void *failure_addr)
+{
+   int ret;
+
+   if (!bus->sigbus_handler) {
+   RTE_LOG(ERR, EAL, "Function sigbus_handler not supported by "
+   "bus (%s)\n", bus->name);


It is not an error. It is OK that some buses cannot handle SIGBUS.


+   return -1;
+   }
+
+   ret = bus->sigbus_handler(failure_addr);
+   rte_errno = ret;
+
+   return !(bus->sigbus_handler && ret <= 0);


There is no point to check bus->sigbus_handler here. It is already 
checked above.

So, it should be just:
   return ret > 0;
I.e. we should continue search if the address is not handled by any device
on the bus (we should stop if it is handled (ret==0) or failed to to handle
(ret < 0)).


+}
+
+int
+rte_bus_sigbus_handler(const void *failure_addr)
+{
+   struct rte_bus *bus;
+
+   int ret = 0;
+   int old_errno = rte_errno;
+
+   rte_errno = 0;
+
+   bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr);
+   /* failed to handle the sigbus, pass the new errno. */
+   if (!bus)
+   ret = 1;
+   else if (rte_errno == -1)


I'm still thinking it is bad to keep negative value in rte_errno here.


+   return -1;
+
+   /* otherwise restore the old errno. */
+   rte_errno = old_errno;
+
+   return ret;
+}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index bdadc4d..2337e71 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -258,4 +258,16 @@ int rte_mp_channel_init(void);
   */
  void dev_callback_process(char *device_name, enum rte_dev_event_type event);
  
+/**

+ * Iterate all buses to find the corresponding bus, to handle the sigbus error.
+ * @param failure_addr
+ * Pointer of the fault address of the sigbus error.
+ *
+ * @return
+ *  0 success to handle the sigbus.
+ * -1 failed to handle the sigbus
+ *  1 no bus can handler the sigbus
+ */
+int rte_bus_sigbus_handler(const void *failure_addr);
+
  #endif /* _EAL_PRIVATE_H_ */




Re: [dpdk-dev] [PATCH v7 6/7] eal: add failure handle mechanism for hotplug

2018-07-09 Thread Andrew Rybchenko

On 09.07.2018 15:01, Jeff Guo wrote:

This patch introduces a failure handle mechanism to handle device
hotplug removal event.

First it can register sigbus handler when enable device event monitor. Once
sigbus error be captured, it will check the failure address and accordingly
remap the invalid memory for the corresponding device. Besed on this
mechanism, it could guaranty the application not crash when the device be
hotplug out.

Signed-off-by: Jeff Guo 
Acked-by: Shaopeng He 
---
v7->v6:
delete some unused part.
---
  lib/librte_eal/linuxapp/eal/eal_dev.c | 112 +-
  1 file changed, 111 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c 
b/lib/librte_eal/linuxapp/eal/eal_dev.c
index 1cf6aeb..0de3fb7 100644
--- a/lib/librte_eal/linuxapp/eal/eal_dev.c
+++ b/lib/librte_eal/linuxapp/eal/eal_dev.c
@@ -4,6 +4,8 @@
  
  #include 

  #include 
+#include 
+#include 
  #include 
  #include 
  
@@ -14,6 +16,10 @@

  #include 
  #include 
  #include 
+#include 
+#include 
+#include 
+#include 
  
  #include "eal_private.h"
  
@@ -23,6 +29,16 @@ static bool monitor_started;

  #define EAL_UEV_MSG_LEN 4096
  #define EAL_UEV_MSG_ELEM_LEN 128
  
+/*

+ * spinlock for device failure process, protect the bus and the device
+ * to avoid race condition.
+ */
+static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER;


Sorry, it is still too vague why the lock is required. It is just generic
words. Please, add details and describe circumstance when it is
required.


Re: [dpdk-dev] [PATCH v2 18/20] net/mlx5: add flow GRE item

2018-07-09 Thread Nélio Laranjeiro
Hi Yongseok,

Only discussing here question, other comments are address, as I don't
have any objection I'll make the modification for them.

On Fri, Jul 06, 2018 at 04:46:11PM -0700, Yongseok Koh wrote:
>[...] 
> > +
> >  /** Handles information leading to a drop fate. */
> >  struct mlx5_flow_verbs {
> > LIST_ENTRY(mlx5_flow_verbs) next;
> > @@ -1005,12 +1010,23 @@ mlx5_flow_item_ipv6(const struct rte_flow_item 
> > *item, struct rte_flow *flow,
> >   item,
> >   "L3 cannot follow an L4"
> >   " layer");
> > +   /*
> > +* IPv6 is not recognised by the NIC inside a GRE tunnel.
> > +* Such support has to be disabled as the rule will be
> > +* accepted.  Tested with Mellanox OFED 4.3-3.0.2.1
> > +*/
> 
> This comment doesn't look appropriate. Do you think it is a bug of OFED/FW,
> which can be fixed? Or, is it a HW erratum? Let's talk offline.

By the time it was as this Mellanox OFED was the latest GA 4.3-3.0.2.1,
this is no more the case today as it cannot be downloaded anymore. A
verification is still necessary.  If the issue is not present anymore
I'll remove the comment with the test.

>[...] 
> > +{
> > +   unsigned int i;
> > +   const enum ibv_flow_spec_type search = IBV_FLOW_SPEC_IPV6;
> > +   struct ibv_spec_header *hdr = (struct ibv_spec_header *)
> > +   ((uint8_t *)attr + sizeof(struct ibv_flow_attr));
> > +
> > +   if (!attr)
> > +   return;
> > +   for (i = 0; i != attr->num_of_specs; ++i) {
> > +   if (hdr->type == search) {
> > +   struct ibv_flow_spec_ipv6 *ip =
> > +   (struct ibv_flow_spec_ipv6 *)hdr;
> > +
> > +   if (!ip->val.next_hdr) {
> 
> What if protocol in IP header does have wrong value other than 47 
> (IPPROTO_GRE)?
> Shouldn't we have a validation check for it in mlx5_flow_item_gre()?
>[...]

Already added, the same issue occurs also with UDP/TCP.  If the user
uses some protocol it must match the following layer, otherwise its
request won't be respected which is a bug.

Thanks,

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH v3 2/8] hash: fix a multi-writer bug

2018-07-09 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Wang, Yipeng1
> Sent: Friday, July 6, 2018 8:47 PM
> To: De Lara Guarch, Pablo 
> Cc: dev@dpdk.org; Wang, Yipeng1 ; Richardson,
> Bruce ; honnappa.nagaraha...@arm.com;
> vgu...@caviumnetworks.com; brijesh.s.si...@gmail.com
> Subject: [PATCH v3 2/8] hash: fix a multi-writer bug

Just a minor comment on the title. It should be a summary of the commit
(try to summarize the bug, instead of using the word "bug").
Keep my ack for the next version (like in other patches).

> 
> Current multi-writer implementation uses Intel TSX to protect the cuckoo path
> moving but not the cuckoo path searching. After searching, we need to verify
> again if the same empty slot still exists at the beginning of the TSX region.
> Otherwise another writer could occupy the empty slot before the TSX region.
> Current code does not verify.
> 
> Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel 
> TSX")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Yipeng Wang 

Acked-by: Pablo de Lara 



Re: [dpdk-dev] [PATCH v3 3/8] hash: fix to have more accurate key slot size

2018-07-09 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Wang, Yipeng1
> Sent: Friday, July 6, 2018 8:47 PM
> To: De Lara Guarch, Pablo 
> Cc: dev@dpdk.org; Wang, Yipeng1 ; Richardson,
> Bruce ; honnappa.nagaraha...@arm.com;
> vgu...@caviumnetworks.com; brijesh.s.si...@gmail.com
> Subject: [PATCH v3 3/8] hash: fix to have more accurate key slot size

Titles always start with a verb whereas in your case, you are starting with a 
noun.
Better to change it to "fix key slot size accuracy"?

Apart from that:

Acked-by: Pablo de Lara 

> 
> This commit calculates the needed key slot size more accurately. The previous
> local cache fix requires the free slot ring to be larger than actually needed.
> The calculation of the value is inaccurate.
> 
> Fixes: 5915699153d7 ("hash: fix scaling by reducing contention")
> Cc: sta...@dpdk.org



Re: [dpdk-dev] [PATCH v3 4/8] hash: make duplicated code into functions

2018-07-09 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Wang, Yipeng1
> Sent: Friday, July 6, 2018 8:47 PM
> To: De Lara Guarch, Pablo 
> Cc: dev@dpdk.org; Wang, Yipeng1 ; Richardson,
> Bruce ; honnappa.nagaraha...@arm.com;
> vgu...@caviumnetworks.com; brijesh.s.si...@gmail.com
> Subject: [PATCH v3 4/8] hash: make duplicated code into functions
> 
> This commit refactors the hash table lookup/add/del code to remove some code
> duplication. Processing on primary bucket can also apply to secondary bucket
> with same code.
> 
> Signed-off-by: Yipeng Wang 
> ---
>  lib/librte_hash/rte_cuckoo_hash.c | 186 
> +++---

...

> @@ -838,41 +830,45 @@ __rte_hash_del_key_with_hash(const struct rte_hash
> *h, const void *key,
>   if (rte_hash_cmp_eq(key, k->key, h) == 0) {
>   remove_entry(h, bkt, i);
> 
> + ret = bkt->key_idx[i] - 1;
> + bkt->key_idx[i] = EMPTY_SLOT;
>   /*
>* Return index where key is stored,
>* subtracting the first dummy index
>*/
> - ret = bkt->key_idx[i] - 1;
> - bkt->key_idx[i] = EMPTY_SLOT;

Actually, this change doesn't look needed, right?
It looks like you are just moving the two lines before the comment.

Apart from this,

Acked-by: Pablo de Lara 



Re: [dpdk-dev] [PATCH v3 5/8] hash: add read and write concurrency support

2018-07-09 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Wang, Yipeng1
> Sent: Friday, July 6, 2018 8:47 PM
> To: De Lara Guarch, Pablo 
> Cc: dev@dpdk.org; Wang, Yipeng1 ; Richardson,
> Bruce ; honnappa.nagaraha...@arm.com;
> vgu...@caviumnetworks.com; brijesh.s.si...@gmail.com
> Subject: [PATCH v3 5/8] hash: add read and write concurrency support
> 
> The existing implementation of librte_hash does not support read-write
> concurrency. This commit implements read-write safety using rte_rwlock and
> rte_rwlock TM version if hardware transactional memory is available.
> 
> Both multi-writer and read-write concurrency is protected by rte_rwlock now.
> The x86 specific header file is removed since the x86 specific RTM function 
> is not
> called directly by rte hash now.
> 
> Signed-off-by: Yipeng Wang 

Acked-by: Pablo de Lara 


Re: [dpdk-dev] [PATCH v10 03/19] bus/pci: enable vfio unmap resource for secondary

2018-07-09 Thread Burakov, Anatoly

On 09-Jul-18 4:36 AM, Qi Zhang wrote:

Subroutine to unmap VFIO resource is shared by secondary and
primary, and it does not work on the secondary process.
The patch adds a dedicate function to handle the situation
when a device is unmapped on a secondary process.

Signed-off-by: Qi Zhang 
---


Hi Qi,

Please correct me if i'm wrong here, but it seems like the unmapping 
code is shared between primary and secondary, and the difference comes 
from interrupts, bus mastering, and removing the device from tailq. Can 
we separate out the common code somehow?


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v2 17/20] net/mlx5: add flow VXLAN-GPE item

2018-07-09 Thread Nélio Laranjeiro
On Fri, Jul 06, 2018 at 04:23:26PM -0700, Yongseok Koh wrote:
> On Wed, Jun 27, 2018 at 05:07:49PM +0200, Nelio Laranjeiro wrote:
> > Signed-off-by: Nelio Laranjeiro 
> > ---
> >  drivers/net/mlx5/mlx5_flow.c | 123 ++-
> >  1 file changed, 120 insertions(+), 3 deletions(-)
> > 
>[...]  
> > +/**
> > + * Validate VXLAN-GPE layer and possibly create the Verbs specification.
> > + *
> > + * @param dev
> > + *   Pointer to Ethernet device.
> > + * @param item[in]
> > + *   Item specification.
> > + * @param flow[in, out]
> > + *   Pointer to flow structure.
> > + * @param flow_size[in]
> > + *   Size in bytes of the available space for to store the flow 
> > information.
> > + * @param error
> > + *   Pointer to error structure.
> > + *
> > + * @return
> > + *   size in bytes necessary for the conversion, a negative errno value
> > + *   otherwise and rte_errno is set.
> > + */
> > +static int
> > +mlx5_flow_item_vxlan_gpe(struct rte_eth_dev *dev,
> > +const struct rte_flow_item *item,
> > +struct rte_flow *flow, const size_t flow_size,
> > +struct rte_flow_error *error)
> 
> It is almost same as mlx5_flow_item_vxlan() except for checking
> priv->config.l3_vxlan_en. One more difference I noticed is that it doesn't 
> check
> flow->exapnd on validation. Why is that? If that's a mistake, isn't it better 
> to
> make the common code shareable?
>[...]

The GPE version needs:

 - l3_vxlan_en
 - set its own tunnel bit (as in this case the following layer may
   directly be an L3)

Indeed there are some common code (as for the TCP/UDP) but sharing it
will be more difficult to read and fix in case of bugs.

In addition if this RFC [1] is fully dropped it will be easier to remove
the dedicated code when the ITEM in the API will also be removed, it may
not be from Mellanox PMD team but from anyone proposing the drop.  The
chances he breaks anything if the code is shared among several items is
high.  It is better to have a single function per item/action according
to the API directly.

Thanks,

[1] https://datatracker.ietf.org/doc/draft-quinn-vxlan-gpe/

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos in cryptodev

2018-07-09 Thread Doherty, Declan

On 06/07/2018 3:28 PM, Verma, Shally wrote:

Hi Declan


-Original Message-
From: Doherty, Declan [mailto:declan.dohe...@intel.com]
Sent: 05 July 2018 20:24
To: Verma, Shally ; pablo.de.lara.gua...@intel.com
Cc: dev@dpdk.org; Athreya, Narayana Prasad ; 
Murthy, Nidadavolu
; Sahu, Sunila ; Gupta, Ashish 
; Kartha,
Umesh 
Subject: Re: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos in 
cryptodev

External Email

Hey Shally,

just a few things inline below mainly concerned with the need to be able
to support session-less operations in future PMDs. I think with a few
minor changes to the API now it should allow session-less to be
supported later without the need for a major rework of the APIs, I don't
think this should cause any major rework to your PMD just the adoption
of some new more explicit op types.

Thanks
Declan

On 03/07/2018 4:24 PM, Shally Verma wrote:

Add rte_crypto_asym.h with supported xfrms
and associated op structures and APIs

API currently supports:
- RSA Encrypt, Decrypt, Sign and Verify
- Modular Exponentiation and Inversion
- DSA Sign and Verify
- Deffie-hellman private key exchange
- Deffie-hellman public key exchange
- Deffie-hellman shared secret compute
- Deffie-hellman public/private key pair generation
using xform chain

Signed-off-by: Shally Verma 
Signed-off-by: Sunila Sahu 
Signed-off-by: Ashish Gupta 
Signed-off-by: Umesh Kartha 
---
   lib/librte_cryptodev/Makefile  |   1 +
   lib/librte_cryptodev/meson.build   |   3 +-
   lib/librte_cryptodev/rte_crypto_asym.h | 496 
+
   3 files changed, 499 insertions(+), 1 deletion(-)

diff --git a/lib/librte_cryptodev/Makefile b/lib/librte_cr


...


+typedef struct rte_crypto_param_t {
+ uint8_t *data;
+ /**< pointer to buffer holding data */
+ rte_iova_t iova;
+ /**< IO address of data buffer */
+ size_t length;
+ /**< length of data in bytes */
+} rte_crypto_param;


What is the intended way for this memory to be allocated,


[Shally] It should be pointer to flat buffers and added only to input/output 
data to/from
asymmetric crypto engine.


it seems like
there might be a more general requirement in DPDK for IO addressable
memory (compression? other hardware acceleators implemented on FPGAs)
than just asymmetric crypto, will we end up needing to support features
like scatter gather lists in this structure?


[Shally] I don’t anticipate that we would need to support scatter-gather data 
buffers as far as it is used for asymmetric.
And I'm not aware if we have requirement to support it for asymmetric 
processing since data size is usually small for
such operations. Thus, app is expected to send linear buffers for input/output.

Does that answer your question? Or did I miss anything?



Sure I understand the rationale.




btw I think this is
probably fine for the moment as it will be expermential but I think it
will need to be addressed before the removal of the expermential tag.



...


+ RTE_CRYPTO_ASYM_XFORM_MODINV,


Would prefer if this was _MOD_INV :)


+ /**< Modular Inverse
+  * Perform Modulus inverse b^(-1) mod n
+  */
+ RTE_CRYPTO_ASYM_XFORM_MODEX,


any this was _MOD_EX :)


[Shally] fine will do name change.




+ /**< Modular Exponentiation
+  * Perform Modular Exponentiation b^e mod n
+  */
+ RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
+ /**< End of list */
+};
+
+/**
+ * Asymmetric crypto operation type variants
+ */
+enum rte_crypto_asym_op_type {
+ RTE_CRYPTO_ASYM_OP_ENCRYPT,
+ /**< Asymmetric Encrypt operation */
+ RTE_CRYPTO_ASYM_OP_DECRYPT,
+ /**< Asymmetric Decrypt operation */
+ RTE_CRYPTO_ASYM_OP_SIGN,
+ /**< Signature Generation operation */
+ RTE_CRYPTO_ASYM_OP_VERIFY,
+ /**< Signature Verification operation */
+ RTE_CRYPTO_ASYM_OP_PRIVATE_KEY_GENERATE,
+ /**< DH Private Key generation operation */
+ RTE_CRYPTO_ASYM_OP_PUBLIC_KEY_GENERATE,
+ /**< DH Public Key generation operation */
+ RTE_CRYPTO_ASYM_OP_SHARED_SECRET_COMPUTE,
+ /**< DH Shared Secret compute operation */
+ RTE_CRYPTO_ASYM_OP_LIST_END
+};
+


I think that having generic operation types which may or may not apply
to all of the defined xforms is confusing from a user perspective and in
the longer term will make it impossible to support session-less
asymmetric operations. If we instead do something like

RTE_CRYPTO_ASYM_OP_RSA_ENCRYPT,
RTE_CRYPTO_ASYM_OP_RSA_DECRYPT,
RTE_CRYPTO_ASYM_OP_RSA_SIGN,
RTE_CRYPTO_ASYM_OP_RSA_VERIFY,
RTE_CRYPTO_ASYM_OP_DH_KEY_GENERATE,
RTE_CRYPTO_ASYM_OP_DH_SHARED_SECRET_COMPUTE,
etc...

Then the op type becomes very explicit and will allow session-less
operations to be supported by PMDs. This shouldn't have any impact on
your current implementation other than updating the op type.



[Shally] Ok, so you suggest to merge xform and op_type (including keeping 
op_type in one place)  for si

Re: [dpdk-dev] [PATCH v2 19/20] net/mlx5: add flow MPLS item

2018-07-09 Thread Nélio Laranjeiro
On Fri, Jul 06, 2018 at 05:11:31PM -0700, Yongseok Koh wrote:
> On Wed, Jun 27, 2018 at 05:07:51PM +0200, Nelio Laranjeiro wrote:
> > Signed-off-by: Nelio Laranjeiro 
> > ---
>[...]
> > +   if (spec) {
> > +   memcpy(&mpls.val.label, spec, sizeof(mpls.val.label));
> > +   memcpy(&mpls.mask.label, mask, sizeof(mpls.mask.label));
> > +   /* Remove unwanted bits from values.  */
> > +   mpls.val.label &= mpls.mask.label;
> > +   }
> > +   if (size <= flow_size)
> 
> Is it guaranteed flow->cur_verbs isn't null if size fits? Could be obvious but
> just want to make sure.

Yes it is.

> > +   mlx5_flow_spec_verbs_add(flow, &mpls, size);
> > +   mlx5_flow_layers_update(flow, MLX5_FLOW_LAYER_MPLS);
> > +   if (layers & MLX5_FLOW_LAYER_OUTER_L4_UDP)
> > +   flow->ptype = RTE_PTYPE_TUNNEL_MPLS_IN_GRE | RTE_PTYPE_L4_UDP;
> > +   else
> > +   flow->ptype = RTE_PTYPE_TUNNEL_MPLS_IN_GRE;
> > +   return size;
> > +#endif /* !HAVE_IBV_DEVICE_MPLS_SUPPORT */
> > +   return rte_flow_error_set(error, ENOTSUP,
> > + RTE_FLOW_ERROR_TYPE_ITEM,
> > + item,
> > + "MPLS is not supported by Verbs, please"
> > + " update.");
> > +}
> > +
> >  /**
> >   * Validate items provided by the user.
> >   *
> > @@ -1650,6 +1722,9 @@ mlx5_flow_items(struct rte_eth_dev *dev,
> > case RTE_FLOW_ITEM_TYPE_GRE:
> > ret = mlx5_flow_item_gre(items, flow, remain, error);
> > break;
> 
> #ifdef HAVE_IBV_DEVICE_MPLS_SUPPORT
> 
> > +   case RTE_FLOW_ITEM_TYPE_MPLS:
> > +   ret = mlx5_flow_item_mpls(items, flow, remain, error);
> > +   break;
> 
> #endif /* !HAVE_IBV_DEVICE_MPLS_SUPPORT */
> 
> How about this?
>[...]

It adds another couple of #ifdef #endif and the final output won't help
much the user, having an error "MPLS is not updated by Verbs, please
update" will help more than "item not supported".

Regards,

-- 
Nélio Laranjeiro
6WIND


Re: [dpdk-dev] [PATCH v10 04/19] vfio: remove uneccessary IPC for group fd clear

2018-07-09 Thread Burakov, Anatoly

On 09-Jul-18 4:36 AM, Qi Zhang wrote:

Clear vfio_group_fd is not necessary to involve any IPC.
Also, current IPC implementation for SOCKET_CLR_GROUP is not
correct. rte_vfio_clear_group on secondary will always fail,
that prevent device be detached correctly on a secondary process.
The patch simply removes all IPC related stuff in
rte_vfio_clear_group.

Signed-off-by: Qi Zhang 
---


Acked-by: Anatoly Burakov 

--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v3 6/8] test: add tests in hash table perf test

2018-07-09 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Wang, Yipeng1
> Sent: Friday, July 6, 2018 8:47 PM
> To: De Lara Guarch, Pablo 
> Cc: dev@dpdk.org; Wang, Yipeng1 ; Richardson,
> Bruce ; honnappa.nagaraha...@arm.com;
> vgu...@caviumnetworks.com; brijesh.s.si...@gmail.com
> Subject: [PATCH v3 6/8] test: add tests in hash table perf test
> 
> New code is added to support read-write concurrency for rte_hash. Due to the
> newly added code in critial path, the perf test is modified to show any
> performance impact.
> It is still a single-thread test.
> 
> Signed-off-by: Yipeng Wang 

Acked-by: Pablo de Lara 


[dpdk-dev] [PATCH] net/cxgbe: fix init failure due to new flash parts

2018-07-09 Thread Rahul Lakkireddy
Add decode logic for new flash parts shipped with new Chelsio NICs
to fix initialization failure on these NICs.

Cc: sta...@dpdk.org

Signed-off-by: Rahul Lakkireddy 
---
 drivers/net/cxgbe/base/t4_hw.c | 97 --
 1 file changed, 84 insertions(+), 13 deletions(-)

diff --git a/drivers/net/cxgbe/base/t4_hw.c b/drivers/net/cxgbe/base/t4_hw.c
index 628b280ef..31762c9c5 100644
--- a/drivers/net/cxgbe/base/t4_hw.c
+++ b/drivers/net/cxgbe/base/t4_hw.c
@@ -4681,9 +4681,8 @@ struct flash_desc {
 int t4_get_flash_params(struct adapter *adapter)
 {
/*
-* Table for non-Numonix supported flash parts.  Numonix parts are left
-* to the preexisting well-tested code.  All flash parts have 64KB
-* sectors.
+* Table for non-standard supported Flash parts.  Note, all Flash
+* parts must have 64KB sectors.
 */
static struct flash_desc supported_flash[] = {
{ 0x00150201, 4 << 20 },   /* Spansion 4MB S25FL032P */
@@ -4692,7 +4691,7 @@ int t4_get_flash_params(struct adapter *adapter)
int ret;
u32 flashid = 0;
unsigned int part, manufacturer;
-   unsigned int density, size;
+   unsigned int density, size = 0;
 
/**
 * Issue a Read ID Command to the Flash part.  We decode supported
@@ -4707,6 +4706,9 @@ int t4_get_flash_params(struct adapter *adapter)
if (ret < 0)
return ret;
 
+   /**
+* Check to see if it's one of our non-standard supported Flash parts.
+*/
for (part = 0; part < ARRAY_SIZE(supported_flash); part++) {
if (supported_flash[part].vendor_and_model_id == flashid) {
adapter->params.sf_size =
@@ -4717,6 +4719,15 @@ int t4_get_flash_params(struct adapter *adapter)
}
}
 
+   /**
+* Decode Flash part size.  The code below looks repetative with
+* common encodings, but that's not guaranteed in the JEDEC
+* specification for the Read JADEC ID command.  The only thing that
+* we're guaranteed by the JADEC specification is where the
+* Manufacturer ID is in the returned result.  After that each
+* Manufacturer ~could~ encode things completely differently.
+* Note, all Flash parts must have 64KB sectors.
+*/
manufacturer = flashid & 0xff;
switch (manufacturer) {
case 0x20: { /* Micron/Numonix */
@@ -4753,20 +4764,80 @@ int t4_get_flash_params(struct adapter *adapter)
case 0x22:
size = 1 << 28; /* 256MB */
break;
-   default:
-   dev_err(adapter, "Micron Flash Part has bad size, ID = 
%#x, Density code = %#x\n",
-   flashid, density);
-   return -EINVAL;
}
+   break;
+   }
 
-   adapter->params.sf_size = size;
-   adapter->params.sf_nsec = size / SF_SEC_SIZE;
+   case 0x9d: { /* ISSI -- Integrated Silicon Solution, Inc. */
+   /**
+* This Density -> Size decoding table is taken from ISSI
+* Data Sheets.
+*/
+   density = (flashid >> 16) & 0xff;
+   switch (density) {
+   case 0x16:
+   size = 1 << 25; /* 32MB */
+   break;
+   case 0x17:
+   size = 1 << 26; /* 64MB */
+   break;
+   }
break;
}
-   default:
-   dev_err(adapter, "Unsupported Flash Part, ID = %#x\n", flashid);
-   return -EINVAL;
+
+   case 0xc2: { /* Macronix */
+   /**
+* This Density -> Size decoding table is taken from Macronix
+* Data Sheets.
+*/
+   density = (flashid >> 16) & 0xff;
+   switch (density) {
+   case 0x17:
+   size = 1 << 23; /* 8MB */
+   break;
+   case 0x18:
+   size = 1 << 24; /* 16MB */
+   break;
+   }
+   break;
+   }
+
+   case 0xef: { /* Winbond */
+   /**
+* This Density -> Size decoding table is taken from Winbond
+* Data Sheets.
+*/
+   density = (flashid >> 16) & 0xff;
+   switch (density) {
+   case 0x17:
+   size = 1 << 23; /* 8MB */
+   break;
+   case 0x18:
+   size = 1 << 24; /* 16MB */
+   break;
+   }
+   break;
}
+   }
+
+   /* If we didn't recognize the FLASH part, that's no real issue: the
+* Hardware/Software contract says that Hardware will _*ALWAYS*_

Re: [dpdk-dev] [PATCH] eal: add request to map reserved physical memory

2018-07-09 Thread Burakov, Anatoly

On 07-Jun-18 1:15 PM, Burakov, Anatoly wrote:

On 06-Jun-18 1:18 AM, Scott Branden wrote:

Hi Anatoly,


On 18-04-27 09:49 AM, Burakov, Anatoly wrote:

On 27-Apr-18 5:30 PM, Scott Branden wrote:

Hi Anatoly,

We'd appreciate your input so we can come to a solution of 
supporting the necessary memory allocations?




Hi Scott,

I'm currently starting to work on a prototype that will be at least 
RFC'd (if not v1'd) during 18.08 timeframe. Basically, the idea is to 
create/destroy named malloc heaps dynamically, and allow user to 
request memory from them. You may then mmap() whatever you want and 
create a malloc heap out of it.


Does that sound reasonable?


Is the plan still to have a patch for 18.08?

Thanks,
  Scott


Hi Scott,

The plan is still to submit an RFC during 18.08 timeframe, but since it 
will be an ABI break, it will only be integrated in the next (18.11) 
release.



Hi Scott,

You're welcome to offer feedback on the proposal :)

http://patches.dpdk.org/project/dpdk/list/?series=453

--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v3 8/8] hash: add new API function to query the key count

2018-07-09 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Wang, Yipeng1
> Sent: Friday, July 6, 2018 8:47 PM
> To: De Lara Guarch, Pablo 
> Cc: dev@dpdk.org; Wang, Yipeng1 ; Richardson,
> Bruce ; honnappa.nagaraha...@arm.com;
> vgu...@caviumnetworks.com; brijesh.s.si...@gmail.com
> Subject: [PATCH v3 8/8] hash: add new API function to query the key count
> 
> Add a new function, rte_hash_count, to return the number of keys that are
> currently stored in the hash table. Corresponding test functions are added 
> into
> hash_test and hash_multiwriter test.
> 
> Signed-off-by: Yipeng Wang 

Acked-by: Pablo de Lara 



Re: [dpdk-dev] [PATCH v5] net/mlx4: support hardware TSO

2018-07-09 Thread Mordechay Haimovsky
inline

> -Original Message-
> From: Matan Azrad
> Sent: Monday, July 9, 2018 4:07 PM
> To: Mordechay Haimovsky ; Adrien Mazarguil
> 
> Cc: dev@dpdk.org
> Subject: RE: [PATCH v5] net/mlx4: support hardware TSO
> 
> 
> 
> Hi Moti
> 
> Please see some comments below.
> 
> From: Mordechay Haimovsky
> > Implement support for hardware TSO.
> >
> > Signed-off-by: Moti Haimovsky 
> > ---
> > v5:
> > * Modification to the code according to review inputs from Matan
> >   Azrad.
> > * Code optimization to the TSO header copy routine.
> > * Rearranged the TSO data-segments creation routine.
> > in reply to
> > 1530715998-15703-1-git-send-email-mo...@mellanox.com
> >
> > v4:
> > * Bug fixes in filling TSO data segments.
> > * Modifications according to review inputs from Adrien Mazarguil
> >   and Matan Azrad.
> > in reply to
> > 1530190137-17848-1-git-send-email-mo...@mellanox.com
> >
> > v3:
> > * Fixed compilation errors in compilers without GNU C extensions
> >   caused by a declaration of zero-length array in the code.
> > in reply to
> > 1530187032-6489-1-git-send-email-mo...@mellanox.com
> >
> > v2:
> > * Fixed coding style warning.
> > in reply to
> > 1530184583-30166-1-git-send-email-mo...@mellanox.com
> >
> > v1:
> > * Fixed coding style warnings.
> > in reply to
> > 1530181779-19716-1-git-send-email-mo...@mellanox.com
> > ---
> >  doc/guides/nics/features/mlx4.ini |   1 +
> >  doc/guides/nics/mlx4.rst  |   3 +
> >  drivers/net/mlx4/Makefile |   5 +
> >  drivers/net/mlx4/mlx4.c   |   9 +
> >  drivers/net/mlx4/mlx4.h   |   5 +
> >  drivers/net/mlx4/mlx4_prm.h   |  15 ++
> >  drivers/net/mlx4/mlx4_rxtx.c  | 372
> > +-
> >  drivers/net/mlx4/mlx4_rxtx.h  |   2 +-
> >  drivers/net/mlx4/mlx4_txq.c   |   8 +-
> >  9 files changed, 416 insertions(+), 4 deletions(-)
> >
> > diff --git a/doc/guides/nics/features/mlx4.ini
> > b/doc/guides/nics/features/mlx4.ini
> > index f6efd21..98a3f61 100644
> > --- a/doc/guides/nics/features/mlx4.ini
> > +++ b/doc/guides/nics/features/mlx4.ini
> > @@ -13,6 +13,7 @@ Queue start/stop = Y
> >  MTU update   = Y
> >  Jumbo frame  = Y
> >  Scattered Rx = Y
> > +TSO  = Y
> >  Promiscuous mode = Y
> >  Allmulticast mode= Y
> >  Unicast MAC filter   = Y
> > diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst index
> > 491106a..12adaeb 100644
> > --- a/doc/guides/nics/mlx4.rst
> > +++ b/doc/guides/nics/mlx4.rst
> > @@ -142,6 +142,9 @@ Limitations
> >The ability to enable/disable CRC stripping requires OFED version
> >4.3-1.5.0.0 and above  or rdma-core version v18 and above.
> >
> > +- TSO (Transmit Segmentation Offload) is supported in OFED version
> > +  4.4 and above or in rdma-core version v18 and above.
> > +
> >  Prerequisites
> >  -
> >
> > diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
> > index
> > 73f9d40..63bc003 100644
> > --- a/drivers/net/mlx4/Makefile
> > +++ b/drivers/net/mlx4/Makefile
> > @@ -85,6 +85,11 @@ mlx4_autoconf.h.new: FORCE
> >  mlx4_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
> > $Q $(RM) -f -- '$@'
> > $Q : > '$@'
> > +   $Q sh -- '$<' '$@' \
> > +   HAVE_IBV_MLX4_WQE_LSO_SEG \
> > +   infiniband/mlx4dv.h \
> > +   type 'struct mlx4_wqe_lso_seg' \
> > +   $(AUTOCONF_OUTPUT)
> >
> >  # Create mlx4_autoconf.h or update it in case it differs from the new one.
> >
> > diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index
> > d151a90..5d8c76d 100644
> > --- a/drivers/net/mlx4/mlx4.c
> > +++ b/drivers/net/mlx4/mlx4.c
> > @@ -677,6 +677,15 @@ struct mlx4_conf {
> >
> > IBV_RAW_PACKET_CAP_SCATTER_FCS);
> > DEBUG("FCS stripping toggling is %ssupported",
> >   priv->hw_fcs_strip ? "" : "not ");
> > +   priv->tso =
> > +   ((device_attr_ex.tso_caps.max_tso > 0) &&
> > +(device_attr_ex.tso_caps.supported_qpts &
> > + (1 << IBV_QPT_RAW_PACKET)));
> > +   if (priv->tso)
> > +   priv->tso_max_payload_sz =
> > +   device_attr_ex.tso_caps.max_tso;
> > +   DEBUG("TSO is %ssupported",
> > + priv->tso ? "" : "not ");
> > /* Configure the first MAC address by default. */
> > err = mlx4_get_mac(priv, &mac.addr_bytes);
> > if (err) {
> > diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h index
> > 300cb4d..89d8c38 100644
> > --- a/drivers/net/mlx4/mlx4.h
> > +++ b/drivers/net/mlx4/mlx4.h
> > @@ -47,6 +47,9 @@
> >  /** Interrupt alarm timeout value in microseconds. */  #define
> > MLX4_INTR_ALARM_TIMEOUT 10
> >
> > +/* Maximum packet headers size (L2+L3+L4) for TSO. */ #define
> > +MLX4_MAX_TSO_HEADER 192
> > +
> >  /** Port parameter. */
> >  #define MLX4_PMD_PORT_KVARG "port"
> >
> > @@ -90,6 +93,8 @@ s

Re: [dpdk-dev] [PATCH v3 7/8] test: add test case for read write concurrency

2018-07-09 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Wang, Yipeng1
> Sent: Friday, July 6, 2018 8:47 PM
> To: De Lara Guarch, Pablo 
> Cc: dev@dpdk.org; Wang, Yipeng1 ; Richardson,
> Bruce ; honnappa.nagaraha...@arm.com;
> vgu...@caviumnetworks.com; brijesh.s.si...@gmail.com
> Subject: [PATCH v3 7/8] test: add test case for read write concurrency
> 
> This commits add a new test case for testing read/write concurrency.
> 
> Signed-off-by: Yipeng Wang 

There are still some double blank lines that I think are not necessary.
Once fixed, keep my ack:

Acked-by: Pablo de Lara 


[dpdk-dev] [PATCH v6] net/mlx4: support hardware TSO

2018-07-09 Thread Moti Haimovsky
Implement support for hardware TSO.

Signed-off-by: Moti Haimovsky 
---
v6:
* Minor bug fixes from previous commit.
* More optimizations on TSO data-segments creation routine.
in reply to
1531132986-5054-1-git-send-email-mo...@mellanox.com

v5:
* Modification to the code according to review inputs from Matan
  Azrad.
* Code optimization to the TSO header copy routine.
* Rearranged the TSO data-segments creation routine.
in reply to
1530715998-15703-1-git-send-email-mo...@mellanox.com

v4:
* Bug fixes in filling TSO data segments.
* Modifications according to review inputs from Adrien Mazarguil
  and Matan Azrad.
in reply to
1530190137-17848-1-git-send-email-mo...@mellanox.com

v3:
* Fixed compilation errors in compilers without GNU C extensions
  caused by a declaration of zero-length array in the code.
in reply to
1530187032-6489-1-git-send-email-mo...@mellanox.com

v2:
* Fixed coding style warning.
in reply to
1530184583-30166-1-git-send-email-mo...@mellanox.com

v1:
* Fixed coding style warnings.
in reply to
1530181779-19716-1-git-send-email-mo...@mellanox.com
---

 doc/guides/nics/features/mlx4.ini |   1 +
 doc/guides/nics/mlx4.rst  |   3 +
 drivers/net/mlx4/Makefile |   5 +
 drivers/net/mlx4/mlx4.c   |   9 +
 drivers/net/mlx4/mlx4.h   |   5 +
 drivers/net/mlx4/mlx4_prm.h   |  15 ++
 drivers/net/mlx4/mlx4_rxtx.c  | 378 +-
 drivers/net/mlx4/mlx4_rxtx.h  |   2 +-
 drivers/net/mlx4/mlx4_txq.c   |   8 +-
 9 files changed, 422 insertions(+), 4 deletions(-)

diff --git a/doc/guides/nics/features/mlx4.ini 
b/doc/guides/nics/features/mlx4.ini
index f6efd21..98a3f61 100644
--- a/doc/guides/nics/features/mlx4.ini
+++ b/doc/guides/nics/features/mlx4.ini
@@ -13,6 +13,7 @@ Queue start/stop = Y
 MTU update   = Y
 Jumbo frame  = Y
 Scattered Rx = Y
+TSO  = Y
 Promiscuous mode = Y
 Allmulticast mode= Y
 Unicast MAC filter   = Y
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 491106a..12adaeb 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -142,6 +142,9 @@ Limitations
   The ability to enable/disable CRC stripping requires OFED version
   4.3-1.5.0.0 and above  or rdma-core version v18 and above.
 
+- TSO (Transmit Segmentation Offload) is supported in OFED version
+  4.4 and above or in rdma-core version v18 and above.
+
 Prerequisites
 -
 
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 73f9d40..63bc003 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -85,6 +85,11 @@ mlx4_autoconf.h.new: FORCE
 mlx4_autoconf.h.new: $(RTE_SDK)/buildtools/auto-config-h.sh
$Q $(RM) -f -- '$@'
$Q : > '$@'
+   $Q sh -- '$<' '$@' \
+   HAVE_IBV_MLX4_WQE_LSO_SEG \
+   infiniband/mlx4dv.h \
+   type 'struct mlx4_wqe_lso_seg' \
+   $(AUTOCONF_OUTPUT)
 
 # Create mlx4_autoconf.h or update it in case it differs from the new one.
 
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index d151a90..5d8c76d 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -677,6 +677,15 @@ struct mlx4_conf {
IBV_RAW_PACKET_CAP_SCATTER_FCS);
DEBUG("FCS stripping toggling is %ssupported",
  priv->hw_fcs_strip ? "" : "not ");
+   priv->tso =
+   ((device_attr_ex.tso_caps.max_tso > 0) &&
+(device_attr_ex.tso_caps.supported_qpts &
+ (1 << IBV_QPT_RAW_PACKET)));
+   if (priv->tso)
+   priv->tso_max_payload_sz =
+   device_attr_ex.tso_caps.max_tso;
+   DEBUG("TSO is %ssupported",
+ priv->tso ? "" : "not ");
/* Configure the first MAC address by default. */
err = mlx4_get_mac(priv, &mac.addr_bytes);
if (err) {
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 300cb4d..89d8c38 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -47,6 +47,9 @@
 /** Interrupt alarm timeout value in microseconds. */
 #define MLX4_INTR_ALARM_TIMEOUT 10
 
+/* Maximum packet headers size (L2+L3+L4) for TSO. */
+#define MLX4_MAX_TSO_HEADER 192
+
 /** Port parameter. */
 #define MLX4_PMD_PORT_KVARG "port"
 
@@ -90,6 +93,8 @@ struct priv {
uint32_t hw_csum:1; /**< Checksum offload is supported. */
uint32_t hw_csum_l2tun:1; /**< Checksum support for L2 tunnels. */
uint32_t hw_fcs_strip:1; /**< FCS stripping toggling is supported. */
+   uint32_t tso:1; /**< Transmit segmentation offload is supported. */
+   uint32_t tso_max_payload_sz; /**< Max supported TSO payload size. */
uint64_t hw_rss_sup; /**< Supported RSS hash fields (Verbs format). */
struct rte_intr_handle intr_han

Re: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos in cryptodev

2018-07-09 Thread Trahe, Fiona
Hi Shally,  Declan, Pablo,

I'm concerned about rushing in significant last-minute changes, but would like 
to see this API in 18.08.
So I suggest the patchset is applied with the caveat that it is experimental 
and will continue to be so
for the next release, in which the remaining open issues should be addressed. 
The main areas of concern are:
 - the structures for xforms and ops and rework needed to cater for sessionless
 - capabilities


Regards,
Fiona


> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Doherty, Declan
> Sent: Monday, July 9, 2018 3:55 PM
> To: Verma, Shally ; De Lara Guarch, Pablo
> 
> Cc: dev@dpdk.org; Athreya, Narayana Prasad 
> ; Murthy,
> Nidadavolu ; Sahu, Sunila 
> ; Gupta,
> Ashish ; Kartha, Umesh 
> Subject: Re: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos in 
> cryptodev
> 
> On 06/07/2018 3:28 PM, Verma, Shally wrote:
> > Hi Declan
> >
> >> -Original Message-
> >> From: Doherty, Declan [mailto:declan.dohe...@intel.com]
> >> Sent: 05 July 2018 20:24
> >> To: Verma, Shally ; pablo.de.lara.gua...@intel.com
> >> Cc: dev@dpdk.org; Athreya, Narayana Prasad 
> >> ; Murthy,
> Nidadavolu
> >> ; Sahu, Sunila ; 
> >> Gupta, Ashish
> ; Kartha,
> >> Umesh 
> >> Subject: Re: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos 
> >> in cryptodev
> >>
> >> External Email
> >>
> >> Hey Shally,
> >>
> >> just a few things inline below mainly concerned with the need to be able
> >> to support session-less operations in future PMDs. I think with a few
> >> minor changes to the API now it should allow session-less to be
> >> supported later without the need for a major rework of the APIs, I don't
> >> think this should cause any major rework to your PMD just the adoption
> >> of some new more explicit op types.
> >>
> >> Thanks
> >> Declan
> >>
> >> On 03/07/2018 4:24 PM, Shally Verma wrote:
> >>> Add rte_crypto_asym.h with supported xfrms
> >>> and associated op structures and APIs
> >>>
> >>> API currently supports:
> >>> - RSA Encrypt, Decrypt, Sign and Verify
> >>> - Modular Exponentiation and Inversion
> >>> - DSA Sign and Verify
> >>> - Deffie-hellman private key exchange
> >>> - Deffie-hellman public key exchange
> >>> - Deffie-hellman shared secret compute
> >>> - Deffie-hellman public/private key pair generation
> >>> using xform chain
> >>>
> >>> Signed-off-by: Shally Verma 
> >>> Signed-off-by: Sunila Sahu 
> >>> Signed-off-by: Ashish Gupta 
> >>> Signed-off-by: Umesh Kartha 
> >>> ---
> >>>lib/librte_cryptodev/Makefile  |   1 +
> >>>lib/librte_cryptodev/meson.build   |   3 +-
> >>>lib/librte_cryptodev/rte_crypto_asym.h | 496 
> >>> +
> >>>3 files changed, 499 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/librte_cryptodev/Makefile b/lib/librte_cr
> >>
> >> ...
> >>
> >>> +typedef struct rte_crypto_param_t {
> >>> + uint8_t *data;
> >>> + /**< pointer to buffer holding data */
> >>> + rte_iova_t iova;
> >>> + /**< IO address of data buffer */
> >>> + size_t length;
> >>> + /**< length of data in bytes */
> >>> +} rte_crypto_param;
> >>
> >> What is the intended way for this memory to be allocated,
> >
> > [Shally] It should be pointer to flat buffers and added only to 
> > input/output data to/from
> > asymmetric crypto engine.
> >
> >> it seems like
> >> there might be a more general requirement in DPDK for IO addressable
> >> memory (compression? other hardware acceleators implemented on FPGAs)
> >> than just asymmetric crypto, will we end up needing to support features
> >> like scatter gather lists in this structure?
> >
> > [Shally] I don’t anticipate that we would need to support scatter-gather 
> > data buffers as far as it is used
> for asymmetric.
> > And I'm not aware if we have requirement to support it for asymmetric 
> > processing since data size is
> usually small for
> > such operations. Thus, app is expected to send linear buffers for 
> > input/output.
> >
> > Does that answer your question? Or did I miss anything?
> >
> 
> Sure I understand the rationale.
> 
> >
> >> btw I think this is
> >> probably fine for the moment as it will be expermential but I think it
> >> will need to be addressed before the removal of the expermential tag.
> >>
> >
> > ...
> >
> >>> + RTE_CRYPTO_ASYM_XFORM_MODINV,
> >>
> >> Would prefer if this was _MOD_INV :)
> >>
> >>> + /**< Modular Inverse
> >>> +  * Perform Modulus inverse b^(-1) mod n
> >>> +  */
> >>> + RTE_CRYPTO_ASYM_XFORM_MODEX,
> >>
> >> any this was _MOD_EX :)
> >
> > [Shally] fine will do name change.
> >
> >>
> >>> + /**< Modular Exponentiation
> >>> +  * Perform Modular Exponentiation b^e mod n
> >>> +  */
> >>> + RTE_CRYPTO_ASYM_XFORM_TYPE_LIST_END
> >>> + /**< End of list */
> >>> +};
> >>> +
> >>> +/**
> >>> + * Asymmetric crypto operation type variants
> >>> + */
> >>> +enum rte_crypto_asym_op_type {
> >>> + RTE_C

Re: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos in cryptodev

2018-07-09 Thread Verma, Shally


>-Original Message-
>From: Trahe, Fiona [mailto:fiona.tr...@intel.com]
>Sent: 09 July 2018 22:15
>To: Doherty, Declan ; Verma, Shally 
>; De Lara Guarch, Pablo
>
>Cc: dev@dpdk.org; Athreya, Narayana Prasad 
>; Murthy, Nidadavolu
>; Sahu, Sunila ; Gupta, 
>Ashish ; Kartha,
>Umesh ; Trahe, Fiona 
>Subject: RE: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos in 
>cryptodev
>
>External Email
>
>Hi Shally,  Declan, Pablo,
>
>I'm concerned about rushing in significant last-minute changes, but would like 
>to see this API in 18.08.
>So I suggest the patchset is applied with the caveat that it is experimental 
>and will continue to be so
>for the next release, in which the remaining open issues should be addressed.
>The main areas of concern are:
> - the structures for xforms and ops and rework needed to cater for sessionless
> - capabilities
>
Sounds good to me. If that’s fine with everyone, I will send current openssl 
PMD patch. Please confirm.

Thanks
Shally
>
>Regards,
>Fiona
>
>
>> -Original Message-
>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Doherty, Declan
>> Sent: Monday, July 9, 2018 3:55 PM
>> To: Verma, Shally ; De Lara Guarch, Pablo
>> 
>> Cc: dev@dpdk.org; Athreya, Narayana Prasad 
>> ; Murthy,
>> Nidadavolu ; Sahu, Sunila 
>> ; Gupta,
>> Ashish ; Kartha, Umesh 
>> Subject: Re: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos 
>> in cryptodev
>>
>> On 06/07/2018 3:28 PM, Verma, Shally wrote:
>> > Hi Declan
>> >
>> >> -Original Message-
>> >> From: Doherty, Declan [mailto:declan.dohe...@intel.com]
>> >> Sent: 05 July 2018 20:24
>> >> To: Verma, Shally ; 
>> >> pablo.de.lara.gua...@intel.com
>> >> Cc: dev@dpdk.org; Athreya, Narayana Prasad 
>> >> ; Murthy,
>> Nidadavolu
>> >> ; Sahu, Sunila ; 
>> >> Gupta, Ashish
>> ; Kartha,
>> >> Umesh 
>> >> Subject: Re: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric 
>> >> algos in cryptodev
>> >>
>> >> External Email
>> >>
>> >> Hey Shally,
>> >>
>> >> just a few things inline below mainly concerned with the need to be able
>> >> to support session-less operations in future PMDs. I think with a few
>> >> minor changes to the API now it should allow session-less to be
>> >> supported later without the need for a major rework of the APIs, I don't
>> >> think this should cause any major rework to your PMD just the adoption
>> >> of some new more explicit op types.
>> >>
>> >> Thanks
>> >> Declan
>> >>
>> >> On 03/07/2018 4:24 PM, Shally Verma wrote:
>> >>> Add rte_crypto_asym.h with supported xfrms
>> >>> and associated op structures and APIs
>> >>>
>> >>> API currently supports:
>> >>> - RSA Encrypt, Decrypt, Sign and Verify
>> >>> - Modular Exponentiation and Inversion
>> >>> - DSA Sign and Verify
>> >>> - Deffie-hellman private key exchange
>> >>> - Deffie-hellman public key exchange
>> >>> - Deffie-hellman shared secret compute
>> >>> - Deffie-hellman public/private key pair generation
>> >>> using xform chain
>> >>>
>> >>> Signed-off-by: Shally Verma 
>> >>> Signed-off-by: Sunila Sahu 
>> >>> Signed-off-by: Ashish Gupta 
>> >>> Signed-off-by: Umesh Kartha 
>> >>> ---
>> >>>lib/librte_cryptodev/Makefile  |   1 +
>> >>>lib/librte_cryptodev/meson.build   |   3 +-
>> >>>lib/librte_cryptodev/rte_crypto_asym.h | 496 
>> >>> +
>> >>>3 files changed, 499 insertions(+), 1 deletion(-)
>> >>>
>> >>> diff --git a/lib/librte_cryptodev/Makefile b/lib/librte_cr
>> >>
>> >> ...
>> >>
>> >>> +typedef struct rte_crypto_param_t {
>> >>> + uint8_t *data;
>> >>> + /**< pointer to buffer holding data */
>> >>> + rte_iova_t iova;
>> >>> + /**< IO address of data buffer */
>> >>> + size_t length;
>> >>> + /**< length of data in bytes */
>> >>> +} rte_crypto_param;
>> >>
>> >> What is the intended way for this memory to be allocated,
>> >
>> > [Shally] It should be pointer to flat buffers and added only to 
>> > input/output data to/from
>> > asymmetric crypto engine.
>> >
>> >> it seems like
>> >> there might be a more general requirement in DPDK for IO addressable
>> >> memory (compression? other hardware acceleators implemented on FPGAs)
>> >> than just asymmetric crypto, will we end up needing to support features
>> >> like scatter gather lists in this structure?
>> >
>> > [Shally] I don’t anticipate that we would need to support scatter-gather 
>> > data buffers as far as it is used
>> for asymmetric.
>> > And I'm not aware if we have requirement to support it for asymmetric 
>> > processing since data size is
>> usually small for
>> > such operations. Thus, app is expected to send linear buffers for 
>> > input/output.
>> >
>> > Does that answer your question? Or did I miss anything?
>> >
>>
>> Sure I understand the rationale.
>>
>> >
>> >> btw I think this is
>> >> probably fine for the moment as it will be expermential but I think it
>> >> will need to be addressed before the removal of the expermential tag.
>> >>
>> >
>> 

[dpdk-dev] [PATCH 0/2] support MAC changes when no live changes allowed

2018-07-09 Thread Alejandro Lucero
This is a patched to fix a functionality coming with the first public
release: changing/setting MAC address.

The original patch assumes all NICs can safely change or set the MAC
in any case. However, this is not always true. NFP depends on the firmware
capabilities and this is not always supported. There are other NICs with
this same limitation, although, as far as I know, not in DPDK. Linux kernel
has a IFF_LIVE_ADDR_CHANGE flag and two NICs are checking this flag for
allowing or not live MAC changes.

The flag proposed in this patch is just the opposite: advertise if live
change not supported and assuming it is supported other way.

Although most NICs support rte_eth_dev_default_mac_addr_set and this
function returns and error when live change is not supported, note that
this function is invoked during port start but the value returned is not
checked. It is likely this is good enough for most of the cases, but
bonding is relying on this start then mac set/change, and a PMD ports is
not properly configured for being used as an slave port in some bonding
modes.



[dpdk-dev] [PATCH 2/2] net/nfp: fix live MAC changes not supported

2018-07-09 Thread Alejandro Lucero
Some NFP firmwares support live changes to the MAC address, but
this is not always true and the firmware advertises it accordingly.

This patch checks if firmware does not support live changes and
sets RTE_ETH_DEV_NOLIVE_MAC_ADDR in that case.

Fixes: af75078fece3 ("first public release")
Cc: sta...@dpdk.org

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index 3658696..fbe74fc 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -2883,6 +2883,9 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
ether_addr_copy((struct ether_addr *)hw->mac_addr,
ð_dev->data->mac_addrs[0]);
 
+   if (!(hw->cap & NFP_NET_CFG_CTRL_LIVE_ADDR))
+   eth_dev->data->dev_flags |= RTE_ETH_DEV_NOLIVE_MAC_ADDR;
+
PMD_INIT_LOG(INFO, "port %d VendorID=0x%x DeviceID=0x%x "
 "mac=%02x:%02x:%02x:%02x:%02x:%02x",
 eth_dev->data->port_id, pci_dev->id.vendor_id,
-- 
1.9.1



[dpdk-dev] [PATCH 1/2] ethdev: fix MAC changes when live change not supported

2018-07-09 Thread Alejandro Lucero
Current code assumes a MAC change can occur when the port has been
started. In fact, there are some NICs which require this port state
for being successful, but other NICs not always support MAC change
in that case.

This patch supports a new device flag for a device advertising this
limitation, and if the flag is set, the MAC is changed before the
port starts.

Fixes: af75078fece3 ("first public release")
Cc: sta...@dpdk.org

Signed-off-by: Alejandro Lucero 
---
 lib/librte_ethdev/rte_ethdev.c | 28 +++-
 lib/librte_ethdev/rte_ethdev.h |  2 ++
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index a9977df..8dbc031 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -1254,19 +1254,14 @@ struct rte_eth_dev *
 }
 
 static void
-rte_eth_dev_config_restore(uint16_t port_id)
+rte_eth_dev_mac_restore(struct rte_eth_dev *dev,
+   struct rte_eth_dev_info *dev_info)
 {
-   struct rte_eth_dev *dev;
-   struct rte_eth_dev_info dev_info;
struct ether_addr *addr;
uint16_t i;
uint32_t pool = 0;
uint64_t pool_mask;
 
-   dev = &rte_eth_devices[port_id];
-
-   rte_eth_dev_info_get(port_id, &dev_info);
-
/* replay MAC address configuration including default MAC */
addr = &dev->data->mac_addrs[0];
if (*dev->dev_ops->mac_addr_set != NULL)
@@ -1275,7 +1270,7 @@ struct rte_eth_dev *
(*dev->dev_ops->mac_addr_add)(dev, addr, 0, pool);
 
if (*dev->dev_ops->mac_addr_add != NULL) {
-   for (i = 1; i < dev_info.max_mac_addrs; i++) {
+   for (i = 1; i < dev_info->max_mac_addrs; i++) {
addr = &dev->data->mac_addrs[i];
 
/* skip zero address */
@@ -1294,6 +1289,14 @@ struct rte_eth_dev *
} while (pool_mask);
}
}
+}
+
+static void
+rte_eth_dev_config_restore(struct rte_eth_dev *dev,
+  struct rte_eth_dev_info *dev_info, uint16_t port_id)
+{
+   if (!(*dev_info->dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR))
+   rte_eth_dev_mac_restore(dev, dev_info);
 
/* replay promiscuous configuration */
if (rte_eth_promiscuous_get(port_id) == 1)
@@ -1312,6 +1315,7 @@ struct rte_eth_dev *
 rte_eth_dev_start(uint16_t port_id)
 {
struct rte_eth_dev *dev;
+   struct rte_eth_dev_info dev_info;
int diag;
 
RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -1327,13 +1331,19 @@ struct rte_eth_dev *
return 0;
}
 
+   rte_eth_dev_info_get(port_id, &dev_info);
+
+   /* Lets restore MAC now if device does not support live change */
+   if (*dev_info.dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR)
+   rte_eth_dev_mac_restore(dev, &dev_info);
+
diag = (*dev->dev_ops->dev_start)(dev);
if (diag == 0)
dev->data->dev_started = 1;
else
return eth_err(port_id, diag);
 
-   rte_eth_dev_config_restore(port_id);
+   rte_eth_dev_config_restore(dev, &dev_info, port_id);
 
if (dev->data->dev_conf.intr_conf.lsc == 0) {
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->link_update, -ENOTSUP);
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 36e3984..85f6908 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1309,6 +1309,8 @@ struct rte_eth_dev_owner {
 #define RTE_ETH_DEV_INTR_RMV 0x0008
 /** Device is port representor */
 #define RTE_ETH_DEV_REPRESENTOR  0x0010
+/** Device does not support MAC change after started */
+#define RTE_ETH_DEV_NOLIVE_MAC_ADDR  0x0020
 
 /**
  * Iterates over valid ethdev ports owned by a specific owner.
-- 
1.9.1



Re: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos in cryptodev

2018-07-09 Thread De Lara Guarch, Pablo
Hi Shally,

> -Original Message-
> From: Verma, Shally [mailto:shally.ve...@cavium.com]
> Sent: Monday, July 9, 2018 6:12 PM
> To: Trahe, Fiona ; Doherty, Declan
> ; De Lara Guarch, Pablo
> 
> Cc: dev@dpdk.org; Athreya, Narayana Prasad
> ; Murthy, Nidadavolu
> ; Sahu, Sunila ;
> Gupta, Ashish ; Kartha, Umesh
> 
> Subject: RE: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos in
> cryptodev
> 
> 
> 
> >-Original Message-
> >From: Trahe, Fiona [mailto:fiona.tr...@intel.com]
> >Sent: 09 July 2018 22:15
> >To: Doherty, Declan ; Verma, Shally
> >; De Lara Guarch, Pablo
> >
> >Cc: dev@dpdk.org; Athreya, Narayana Prasad
> >; Murthy, Nidadavolu
> >; Sahu, Sunila ;
> >Gupta, Ashish ; Kartha, Umesh
> >; Trahe, Fiona 
> >Subject: RE: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric
> >algos in cryptodev
> >
> >External Email
> >
> >Hi Shally,  Declan, Pablo,
> >
> >I'm concerned about rushing in significant last-minute changes, but would 
> >like
> to see this API in 18.08.
> >So I suggest the patchset is applied with the caveat that it is
> >experimental and will continue to be so for the next release, in which the
> remaining open issues should be addressed.
> >The main areas of concern are:
> > - the structures for xforms and ops and rework needed to cater for
> >sessionless
> > - capabilities
> >
> Sounds good to me. If that’s fine with everyone, I will send current openssl 
> PMD
> patch. Please confirm.
> 

I agree with Fiona. This was postponed one release, so it is fair that it makes 
it into 18.08,
knowing that there will be substantial changes in the next release.

About OpenSSL, please make sure that it works on 1.1.0.

Thanks,
Pablo

> Thanks
> Shally
> >
> >Regards,
> >Fiona
> >
> >



Re: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos in cryptodev

2018-07-09 Thread Verma, Shally


>-Original Message-
>From: De Lara Guarch, Pablo [mailto:pablo.de.lara.gua...@intel.com]
>Sent: 09 July 2018 22:46
>To: Verma, Shally ; Trahe, Fiona 
>; Doherty, Declan 
>Cc: dev@dpdk.org; Athreya, Narayana Prasad 
>; Murthy, Nidadavolu
>; Sahu, Sunila ; Gupta, 
>Ashish ; Kartha,
>Umesh 
>Subject: RE: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos in 
>cryptodev
>
>External Email
>
>Hi Shally,
>
>> -Original Message-
>> From: Verma, Shally [mailto:shally.ve...@cavium.com]
>> Sent: Monday, July 9, 2018 6:12 PM
>> To: Trahe, Fiona ; Doherty, Declan
>> ; De Lara Guarch, Pablo
>> 
>> Cc: dev@dpdk.org; Athreya, Narayana Prasad
>> ; Murthy, Nidadavolu
>> ; Sahu, Sunila ;
>> Gupta, Ashish ; Kartha, Umesh
>> 
>> Subject: RE: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric algos in
>> cryptodev
>>
>>
>>
>> >-Original Message-
>> >From: Trahe, Fiona [mailto:fiona.tr...@intel.com]
>> >Sent: 09 July 2018 22:15
>> >To: Doherty, Declan ; Verma, Shally
>> >; De Lara Guarch, Pablo
>> >
>> >Cc: dev@dpdk.org; Athreya, Narayana Prasad
>> >; Murthy, Nidadavolu
>> >; Sahu, Sunila ;
>> >Gupta, Ashish ; Kartha, Umesh
>> >; Trahe, Fiona 
>> >Subject: RE: [dpdk-dev] [PATCH v4 1/4] lib/cryptodev: add asymmetric
>> >algos in cryptodev
>> >
>> >External Email
>> >
>> >Hi Shally,  Declan, Pablo,
>> >
>> >I'm concerned about rushing in significant last-minute changes, but would 
>> >like
>> to see this API in 18.08.
>> >So I suggest the patchset is applied with the caveat that it is
>> >experimental and will continue to be so for the next release, in which the
>> remaining open issues should be addressed.
>> >The main areas of concern are:
>> > - the structures for xforms and ops and rework needed to cater for
>> >sessionless
>> > - capabilities
>> >
>> Sounds good to me. If that’s fine with everyone, I will send current openssl 
>> PMD
>> patch. Please confirm.
>>
>
>I agree with Fiona. This was postponed one release, so it is fair that it 
>makes it into 18.08,
>knowing that there will be substantial changes in the next release.
>
We will continue to discuss through open items  to find level of change.

>About OpenSSL, please make sure that it works on 1.1.0.
Yup. It will support 1.1.0 with compatibility to 1.0.2

Thanks
Shally

>
>Thanks,
>Pablo
>
>> Thanks
>> Shally
>> >
>> >Regards,
>> >Fiona
>> >
>> >



[dpdk-dev] [PATCH v4 0/8] Add read-write concurrency to rte_hash library

2018-07-09 Thread Yipeng Wang
This patch set adds the read-write concurrency support in rte_hash.
A new flag value is added to indicate if read-write concurrency is needed
during creation time. Test cases are implemented to do functional and
performance tests.

The new concurrency model is based on rte_rwlock. When Intel TSX is
available and the users indicate to use it, the TM version of the
rte_rwlock will be called. Both multi-writer and read-write concurrency
are protected by the rte_rwlock instead of the x86 specific RTM
instructions, so the x86 specific header rte_cuckoo_hash_x86.h is removed
and the code is infused into the main .c file.

A new rte_hash_count API is proposed to count how many keys are inserted
into the hash table.

v3->v4:
1. Change commit message titles as Pablo suggested. (Pablo)
2. hash: remove unnecessary changes in commit 4. (Pablo)
3. test: remove unnecessary double blank lines. (Pablo)
4. Add Pablo's ack in commit message.

v2->v3:
1. hash: Concurrency bug fix: after beginning cuckoo path moving,
the last empty slot needs to be verified again in case other writers
raced into this slot and occupy it. A new commit is added to do this
bug fix since it applies to master head as well.
2. hash: Concurrency bug fix: if cuckoo path is detected to be invalid,
the current slot needs to be emptied since it is duplicated to its
target bucket.
3. hash: "const" is used for types in multiple locations. (Pablo)
4. hash: rte_malloc used for readwriter lock used wrong align
argument. Similar fix applies to master head so a new commit is
created. (Pablo)
5. hash: ring size calculation fix is moved to front. (Pablo)
6. hash: search-and-remove function is refactored to be more aligned
with other search function. (Pablo)
7. test: using jhash in functional test for read-write concurrency.
It is because jhash with sequential keys incur more cuckoo path.
8. Multiple coding style, typo, commit message fixes. (Pablo)

v1->v2:
1. Split each commit into two commits for easier review (Pablo).
2. Add more comments in various places (Pablo).
3. hash: In key insertion function, move duplicated key checking to
earlier location and protect it using locks. Checking duplicated key
should happen first and data updates should be protected.
4. hash: In lookup bulk function, put signature comparison in lock,
since writers could happen between signature match on two buckets.
5. hash: Add write locks to reset function as well to protect resets.
5. test: Fix 32-bit compilation error in read-write test (Pablo).
6. test: Check total physical core count in read-write test. Don't
test with thread count that larger than physical core count.
7. Other minor fixes such as typos (Pablo).


Yipeng Wang (8):
  hash: fix multiwriter lock memory allocation
  hash: fix a multi-writer race condition
  hash: fix key slot size accuracy
  hash: make duplicated code into functions
  hash: add read and write concurrency support
  test: add tests in hash table perf test
  test: add test case for read write concurrency
  hash: add new API function to query the key count

 lib/librte_hash/meson.build   |   1 -
 lib/librte_hash/rte_cuckoo_hash.c | 701 +-
 lib/librte_hash/rte_cuckoo_hash.h |  18 +-
 lib/librte_hash/rte_cuckoo_hash_x86.h | 164 
 lib/librte_hash/rte_hash.h|  14 +
 lib/librte_hash/rte_hash_version.map  |   8 +
 test/test/Makefile|   1 +
 test/test/test_hash.c |  12 +
 test/test/test_hash_multiwriter.c |   9 +
 test/test/test_hash_perf.c|  36 +-
 test/test/test_hash_readwrite.c   | 637 ++
 11 files changed, 1156 insertions(+), 445 deletions(-)
 delete mode 100644 lib/librte_hash/rte_cuckoo_hash_x86.h
 create mode 100644 test/test/test_hash_readwrite.c

-- 
2.7.4



[dpdk-dev] [PATCH v4 1/8] hash: fix multiwriter lock memory allocation

2018-07-09 Thread Yipeng Wang
When malloc for multiwriter_lock, the align should be
RTE_CACHE_LINE_SIZE rather than LCORE_CACHE_SIZE.

Also there should be check to verify the success of
rte_malloc.

Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel TSX")
Cc: sta...@dpdk.org

Signed-off-by: Yipeng Wang 
Acked-by: Pablo de Lara 
---
 lib/librte_hash/rte_cuckoo_hash.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index a07543a..80dcf41 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -281,7 +281,10 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->add_key = ADD_KEY_MULTIWRITER;
h->multiwriter_lock = rte_malloc(NULL,
sizeof(rte_spinlock_t),
-   LCORE_CACHE_SIZE);
+   RTE_CACHE_LINE_SIZE);
+   if (h->multiwriter_lock == NULL)
+   goto err_unlock;
+
rte_spinlock_init(h->multiwriter_lock);
}
} else
-- 
2.7.4



[dpdk-dev] [PATCH v4 3/8] hash: fix key slot size accuracy

2018-07-09 Thread Yipeng Wang
This commit calculates the needed key slot size more
accurately. The previous local cache fix requires
the free slot ring to be larger than actually needed.
The calculation of the value is inaccurate.

Fixes: 5915699153d7 ("hash: fix scaling by reducing contention")
Cc: sta...@dpdk.org

Signed-off-by: Yipeng Wang 
Acked-by: Pablo de Lara 
---
 lib/librte_hash/rte_cuckoo_hash.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index 80dcf41..11602af 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -126,13 +126,13 @@ rte_hash_create(const struct rte_hash_parameters *params)
 * except for the first cache
 */
num_key_slots = params->entries + (RTE_MAX_LCORE - 1) *
-   LCORE_CACHE_SIZE + 1;
+   (LCORE_CACHE_SIZE - 1) + 1;
else
num_key_slots = params->entries + 1;
 
snprintf(ring_name, sizeof(ring_name), "HT_%s", params->name);
/* Create ring (Dummy slot index is not enqueued) */
-   r = rte_ring_create(ring_name, rte_align32pow2(num_key_slots - 1),
+   r = rte_ring_create(ring_name, rte_align32pow2(num_key_slots),
params->socket_id, 0);
if (r == NULL) {
RTE_LOG(ERR, HASH, "memory allocation failed\n");
@@ -291,7 +291,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->add_key = ADD_KEY_SINGLEWRITER;
 
/* Populate free slots ring. Entry zero is reserved for key misses. */
-   for (i = 1; i < params->entries + 1; i++)
+   for (i = 1; i < num_key_slots; i++)
rte_ring_sp_enqueue(r, (void *)((uintptr_t) i));
 
te->data = (void *) h;
@@ -373,7 +373,7 @@ void
 rte_hash_reset(struct rte_hash *h)
 {
void *ptr;
-   unsigned i;
+   uint32_t tot_ring_cnt, i;
 
if (h == NULL)
return;
@@ -386,7 +386,13 @@ rte_hash_reset(struct rte_hash *h)
rte_pause();
 
/* Repopulate the free slots ring. Entry zero is reserved for key 
misses */
-   for (i = 1; i < h->entries + 1; i++)
+   if (h->hw_trans_mem_support)
+   tot_ring_cnt = h->entries + (RTE_MAX_LCORE - 1) *
+   (LCORE_CACHE_SIZE - 1);
+   else
+   tot_ring_cnt = h->entries;
+
+   for (i = 1; i < tot_ring_cnt + 1; i++)
rte_ring_sp_enqueue(h->free_slots, (void *)((uintptr_t) i));
 
if (h->hw_trans_mem_support) {
-- 
2.7.4



[dpdk-dev] [PATCH v4 6/8] test: add tests in hash table perf test

2018-07-09 Thread Yipeng Wang
New code is added to support read-write concurrency for
rte_hash. Due to the newly added code in critial path,
the perf test is modified to show any performance impact.
It is still a single-thread test.

Signed-off-by: Yipeng Wang 
Acked-by: Pablo de Lara 
---
 test/test/test_hash_perf.c | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index a81d0c7..33dcb9f 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -76,7 +76,8 @@ static struct rte_hash_parameters ut_params = {
 };
 
 static int
-create_table(unsigned with_data, unsigned table_index)
+create_table(unsigned int with_data, unsigned int table_index,
+   unsigned int with_locks)
 {
char name[RTE_HASH_NAMESIZE];
 
@@ -86,6 +87,14 @@ create_table(unsigned with_data, unsigned table_index)
else
sprintf(name, "test_hash%d", hashtest_key_lens[table_index]);
 
+
+   if (with_locks)
+   ut_params.extra_flag =
+   RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT
+   | RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY;
+   else
+   ut_params.extra_flag = 0;
+
ut_params.name = name;
ut_params.key_len = hashtest_key_lens[table_index];
ut_params.socket_id = rte_socket_id();
@@ -459,7 +468,7 @@ reset_table(unsigned table_index)
 }
 
 static int
-run_all_tbl_perf_tests(unsigned with_pushes)
+run_all_tbl_perf_tests(unsigned int with_pushes, unsigned int with_locks)
 {
unsigned i, j, with_data, with_hash;
 
@@ -468,7 +477,7 @@ run_all_tbl_perf_tests(unsigned with_pushes)
 
for (with_data = 0; with_data <= 1; with_data++) {
for (i = 0; i < NUM_KEYSIZES; i++) {
-   if (create_table(with_data, i) < 0)
+   if (create_table(with_data, i, with_locks) < 0)
return -1;
 
if (get_input_keys(with_pushes, i) < 0)
@@ -611,15 +620,20 @@ fbk_hash_perf_test(void)
 static int
 test_hash_perf(void)
 {
-   unsigned with_pushes;
-
-   for (with_pushes = 0; with_pushes <= 1; with_pushes++) {
-   if (with_pushes == 0)
-   printf("\nALL ELEMENTS IN PRIMARY LOCATION\n");
+   unsigned int with_pushes, with_locks;
+   for (with_locks = 0; with_locks <= 1; with_locks++) {
+   if (with_locks)
+   printf("\nWith locks in the code\n");
else
-   printf("\nELEMENTS IN PRIMARY OR SECONDARY LOCATION\n");
-   if (run_all_tbl_perf_tests(with_pushes) < 0)
-   return -1;
+   printf("\nWithout locks in the code\n");
+   for (with_pushes = 0; with_pushes <= 1; with_pushes++) {
+   if (with_pushes == 0)
+   printf("\nALL ELEMENTS IN PRIMARY LOCATION\n");
+   else
+   printf("\nELEMENTS IN PRIMARY OR SECONDARY 
LOCATION\n");
+   if (run_all_tbl_perf_tests(with_pushes, with_locks) < 0)
+   return -1;
+   }
}
if (fbk_hash_perf_test() < 0)
return -1;
-- 
2.7.4



[dpdk-dev] [PATCH v4 2/8] hash: fix a multi-writer race condition

2018-07-09 Thread Yipeng Wang
Current multi-writer implementation uses Intel TSX to
protect the cuckoo path moving but not the cuckoo
path searching. After searching, we need to verify again if
the same empty slot still exists at the begining of the TSX
region. Otherwise another writer could occupy the empty slot
before the TSX region. Current code does not verify.

Fixes: be856325cba3 ("hash: add scalable multi-writer insertion with Intel TSX")
Cc: sta...@dpdk.org

Signed-off-by: Yipeng Wang 
Acked-by: Pablo de Lara 
---
 lib/librte_hash/rte_cuckoo_hash_x86.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_hash/rte_cuckoo_hash_x86.h 
b/lib/librte_hash/rte_cuckoo_hash_x86.h
index 2c5b017..981d7bd 100644
--- a/lib/librte_hash/rte_cuckoo_hash_x86.h
+++ b/lib/librte_hash/rte_cuckoo_hash_x86.h
@@ -66,6 +66,9 @@ rte_hash_cuckoo_move_insert_mw_tm(const struct rte_hash *h,
while (try < RTE_HASH_TSX_MAX_RETRY) {
status = rte_xbegin();
if (likely(status == RTE_XBEGIN_STARTED)) {
+   /* In case empty slot was gone before entering TSX */
+   if (curr_bkt->key_idx[curr_slot] != EMPTY_SLOT)
+   rte_xabort(RTE_XABORT_CUCKOO_PATH_INVALIDED);
while (likely(curr_node->prev != NULL)) {
prev_node = curr_node->prev;
prev_bkt = prev_node->bkt;
-- 
2.7.4



[dpdk-dev] [PATCH v4 4/8] hash: make duplicated code into functions

2018-07-09 Thread Yipeng Wang
This commit refactors the hash table lookup/add/del code
to remove some code duplication. Processing on primary bucket can
also apply to secondary bucket with same code.

Signed-off-by: Yipeng Wang 
Acked-by: Pablo de Lara 
---
 lib/librte_hash/rte_cuckoo_hash.c | 182 +++---
 1 file changed, 89 insertions(+), 93 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index 11602af..b812f33 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -485,6 +485,33 @@ enqueue_slot_back(const struct rte_hash *h,
rte_ring_sp_enqueue(h->free_slots, slot_id);
 }
 
+/* Search a key from bucket and update its data */
+static inline int32_t
+search_and_update(const struct rte_hash *h, void *data, const void *key,
+   struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
+{
+   int i;
+   struct rte_hash_key *k, *keys = h->key_store;
+
+   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
+   if (bkt->sig_current[i] == sig &&
+   bkt->sig_alt[i] == alt_hash) {
+   k = (struct rte_hash_key *) ((char *)keys +
+   bkt->key_idx[i] * h->key_entry_size);
+   if (rte_hash_cmp_eq(key, k->key, h) == 0) {
+   /* Update data */
+   k->pdata = data;
+   /*
+* Return index where key is stored,
+* subtracting the first dummy index
+*/
+   return bkt->key_idx[i] - 1;
+   }
+   }
+   }
+   return -1;
+}
+
 static inline int32_t
 __rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
hash_sig_t sig, void *data)
@@ -493,7 +520,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, 
const void *key,
uint32_t prim_bucket_idx, sec_bucket_idx;
unsigned i;
struct rte_hash_bucket *prim_bkt, *sec_bkt;
-   struct rte_hash_key *new_k, *k, *keys = h->key_store;
+   struct rte_hash_key *new_k, *keys = h->key_store;
void *slot_id = NULL;
uint32_t new_idx;
int ret;
@@ -547,46 +574,14 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, 
const void *key,
new_idx = (uint32_t)((uintptr_t) slot_id);
 
/* Check if key is already inserted in primary location */
-   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-   if (prim_bkt->sig_current[i] == sig &&
-   prim_bkt->sig_alt[i] == alt_hash) {
-   k = (struct rte_hash_key *) ((char *)keys +
-   prim_bkt->key_idx[i] * 
h->key_entry_size);
-   if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-   /* Enqueue index of free slot back in the ring. 
*/
-   enqueue_slot_back(h, cached_free_slots, 
slot_id);
-   /* Update data */
-   k->pdata = data;
-   /*
-* Return index where key is stored,
-* subtracting the first dummy index
-*/
-   ret = prim_bkt->key_idx[i] - 1;
-   goto failure;
-   }
-   }
-   }
+   ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+   if (ret != -1)
+   goto failure;
 
/* Check if key is already inserted in secondary location */
-   for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-   if (sec_bkt->sig_alt[i] == sig &&
-   sec_bkt->sig_current[i] == alt_hash) {
-   k = (struct rte_hash_key *) ((char *)keys +
-   sec_bkt->key_idx[i] * 
h->key_entry_size);
-   if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-   /* Enqueue index of free slot back in the ring. 
*/
-   enqueue_slot_back(h, cached_free_slots, 
slot_id);
-   /* Update data */
-   k->pdata = data;
-   /*
-* Return index where key is stored,
-* subtracting the first dummy index
-*/
-   ret = sec_bkt->key_idx[i] - 1;
-   goto failure;
-   }
-   }
-   }
+   ret = search_and_update(h, data, key, sec_bkt, alt_hash, sig);
+   if (ret != -1)
+   goto failure;
 

[dpdk-dev] [PATCH v4 5/8] hash: add read and write concurrency support

2018-07-09 Thread Yipeng Wang
The existing implementation of librte_hash does not support read-write
concurrency. This commit implements read-write safety using rte_rwlock
and rte_rwlock TM version if hardware transactional memory is available.

Both multi-writer and read-write concurrency is protected by rte_rwlock
now. The x86 specific header file is removed since the x86 specific RTM
function is not called directly by rte hash now.

Signed-off-by: Yipeng Wang 
Acked-by: Pablo de Lara 
---
 lib/librte_hash/meson.build   |   1 -
 lib/librte_hash/rte_cuckoo_hash.c | 520 ++
 lib/librte_hash/rte_cuckoo_hash.h |  18 +-
 lib/librte_hash/rte_cuckoo_hash_x86.h | 167 ---
 lib/librte_hash/rte_hash.h|   3 +
 5 files changed, 348 insertions(+), 361 deletions(-)
 delete mode 100644 lib/librte_hash/rte_cuckoo_hash_x86.h

diff --git a/lib/librte_hash/meson.build b/lib/librte_hash/meson.build
index e139e1d..efc06ed 100644
--- a/lib/librte_hash/meson.build
+++ b/lib/librte_hash/meson.build
@@ -6,7 +6,6 @@ headers = files('rte_cmp_arm64.h',
'rte_cmp_x86.h',
'rte_crc_arm64.h',
'rte_cuckoo_hash.h',
-   'rte_cuckoo_hash_x86.h',
'rte_fbk_hash.h',
'rte_hash_crc.h',
'rte_hash.h',
diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index b812f33..35631cc 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -31,9 +31,6 @@
 #include "rte_hash.h"
 #include "rte_cuckoo_hash.h"
 
-#if defined(RTE_ARCH_X86)
-#include "rte_cuckoo_hash_x86.h"
-#endif
 
 TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
 
@@ -93,8 +90,10 @@ rte_hash_create(const struct rte_hash_parameters *params)
void *buckets = NULL;
char ring_name[RTE_RING_NAMESIZE];
unsigned num_key_slots;
-   unsigned hw_trans_mem_support = 0;
unsigned i;
+   unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
+   unsigned int readwrite_concur_support = 0;
+
rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
 
hash_list = RTE_TAILQ_CAST(rte_hash_tailq.head, rte_hash_list);
@@ -118,8 +117,16 @@ rte_hash_create(const struct rte_hash_parameters *params)
if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT)
hw_trans_mem_support = 1;
 
+   if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD)
+   multi_writer_support = 1;
+
+   if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY) {
+   readwrite_concur_support = 1;
+   multi_writer_support = 1;
+   }
+
/* Store all keys and leave the first entry as a dummy entry for 
lookup_bulk */
-   if (hw_trans_mem_support)
+   if (multi_writer_support)
/*
 * Increase number of slots by total number of indices
 * that can be stored in the lcore caches
@@ -233,7 +240,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->cmp_jump_table_idx = KEY_OTHER_BYTES;
 #endif
 
-   if (hw_trans_mem_support) {
+   if (multi_writer_support) {
h->local_free_slots = rte_zmalloc_socket(NULL,
sizeof(struct lcore_cache) * RTE_MAX_LCORE,
RTE_CACHE_LINE_SIZE, params->socket_id);
@@ -261,6 +268,8 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->key_store = k;
h->free_slots = r;
h->hw_trans_mem_support = hw_trans_mem_support;
+   h->multi_writer_support = multi_writer_support;
+   h->readwrite_concur_support = readwrite_concur_support;
 
 #if defined(RTE_ARCH_X86)
if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
@@ -271,24 +280,17 @@ rte_hash_create(const struct rte_hash_parameters *params)
 #endif
h->sig_cmp_fn = RTE_HASH_COMPARE_SCALAR;
 
-   /* Turn on multi-writer only with explicit flat from user and TM
+   /* Turn on multi-writer only with explicit flag from user and TM
 * support.
 */
-   if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_MULTI_WRITER_ADD) {
-   if (h->hw_trans_mem_support) {
-   h->add_key = ADD_KEY_MULTIWRITER_TM;
-   } else {
-   h->add_key = ADD_KEY_MULTIWRITER;
-   h->multiwriter_lock = rte_malloc(NULL,
-   sizeof(rte_spinlock_t),
-   RTE_CACHE_LINE_SIZE);
-   if (h->multiwriter_lock == NULL)
-   goto err_unlock;
-
-   rte_spinlock_init(h->multiwriter_lock);
-   }
-   } else
-   h->add_key = ADD_KEY_SINGLEWRITER;
+   if (h->multi_writer_support) {
+   h->readwrite_lock = rte_malloc(NULL, sizeof(rte_rwlock_t),
+  

[dpdk-dev] [PATCH v4 7/8] test: add test case for read write concurrency

2018-07-09 Thread Yipeng Wang
This commits add a new test case for testing read/write concurrency.

Signed-off-by: Yipeng Wang 
Acked-by: Pablo de Lara 
---
 test/test/Makefile  |   1 +
 test/test/test_hash_readwrite.c | 637 
 2 files changed, 638 insertions(+)
 create mode 100644 test/test/test_hash_readwrite.c

diff --git a/test/test/Makefile b/test/test/Makefile
index eccc8ef..6ce66c9 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -113,6 +113,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_perf.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_functions.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_scaling.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_multiwriter.c
+SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_readwrite.c
 
 SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm.c
 SRCS-$(CONFIG_RTE_LIBRTE_LPM) += test_lpm_perf.c
diff --git a/test/test/test_hash_readwrite.c b/test/test/test_hash_readwrite.c
new file mode 100644
index 000..55ae33d
--- /dev/null
+++ b/test/test/test_hash_readwrite.c
@@ -0,0 +1,637 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test.h"
+
+#define RTE_RWTEST_FAIL 0
+
+#define TOTAL_ENTRY (16*1024*1024)
+#define TOTAL_INSERT (15*1024*1024)
+
+#define NUM_TEST 3
+unsigned int core_cnt[NUM_TEST] = {2, 4, 8};
+
+struct perf {
+   uint32_t single_read;
+   uint32_t single_write;
+   uint32_t read_only[NUM_TEST];
+   uint32_t write_only[NUM_TEST];
+   uint32_t read_write_r[NUM_TEST];
+   uint32_t read_write_w[NUM_TEST];
+};
+
+static struct perf htm_results, non_htm_results;
+
+struct {
+   uint32_t *keys;
+   uint32_t *found;
+   uint32_t num_insert;
+   uint32_t rounded_tot_insert;
+   struct rte_hash *h;
+} tbl_rw_test_param;
+
+static rte_atomic64_t gcycles;
+static rte_atomic64_t ginsertions;
+
+static rte_atomic64_t gread_cycles;
+static rte_atomic64_t gwrite_cycles;
+
+static rte_atomic64_t greads;
+static rte_atomic64_t gwrites;
+
+static int
+test_hash_readwrite_worker(__attribute__((unused)) void *arg)
+{
+   uint64_t i, offset;
+   uint32_t lcore_id = rte_lcore_id();
+   uint64_t begin, cycles;
+   int ret;
+
+   offset = (lcore_id - rte_get_master_lcore())
+   * tbl_rw_test_param.num_insert;
+
+   printf("Core #%d inserting and reading %d: %'"PRId64" - %'"PRId64"\n",
+  lcore_id, tbl_rw_test_param.num_insert,
+  offset, offset + tbl_rw_test_param.num_insert);
+
+   begin = rte_rdtsc_precise();
+
+   for (i = offset; i < offset + tbl_rw_test_param.num_insert; i++) {
+
+   if (rte_hash_lookup(tbl_rw_test_param.h,
+   tbl_rw_test_param.keys + i) > 0)
+   break;
+
+   ret = rte_hash_add_key(tbl_rw_test_param.h,
+tbl_rw_test_param.keys + i);
+   if (ret < 0)
+   break;
+
+   if (rte_hash_lookup(tbl_rw_test_param.h,
+   tbl_rw_test_param.keys + i) != ret)
+   break;
+   }
+
+   cycles = rte_rdtsc_precise() - begin;
+   rte_atomic64_add(&gcycles, cycles);
+   rte_atomic64_add(&ginsertions, i - offset);
+
+   for (; i < offset + tbl_rw_test_param.num_insert; i++)
+   tbl_rw_test_param.keys[i] = RTE_RWTEST_FAIL;
+
+   return 0;
+}
+
+static int
+init_params(int use_htm, int use_jhash)
+{
+   unsigned int i;
+
+   uint32_t *keys = NULL;
+   uint32_t *found = NULL;
+   struct rte_hash *handle;
+
+   struct rte_hash_parameters hash_params = {
+   .entries = TOTAL_ENTRY,
+   .key_len = sizeof(uint32_t),
+   .hash_func_init_val = 0,
+   .socket_id = rte_socket_id(),
+   };
+   if (use_jhash)
+   hash_params.hash_func = rte_jhash;
+   else
+   hash_params.hash_func = rte_hash_crc;
+
+   if (use_htm)
+   hash_params.extra_flag =
+   RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT |
+   RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY;
+   else
+   hash_params.extra_flag =
+   RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY;
+
+   hash_params.name = "tests";
+
+   handle = rte_hash_create(&hash_params);
+   if (handle == NULL) {
+   printf("hash creation failed");
+   return -1;
+   }
+
+   tbl_rw_test_param.h = handle;
+   keys = rte_malloc(NULL, sizeof(uint32_t) * TOTAL_ENTRY, 0);
+
+   if (keys == NULL) {
+   printf("RTE_MALLOC failed\n");
+   goto err;
+   }
+
+   found = rte_zmalloc(NULL, sizeof(uint32_t) * TOTAL_ENTRY, 0);
+   if (found == NULL) {
+   printf("RTE_ZMALLOC failed\n")

[dpdk-dev] [PATCH v4 8/8] hash: add new API function to query the key count

2018-07-09 Thread Yipeng Wang
Add a new function, rte_hash_count, to return the number of keys that
are currently stored in the hash table. Corresponding test functions are
added into hash_test and hash_multiwriter test.

Signed-off-by: Yipeng Wang 
Acked-by: Pablo de Lara 
---
 lib/librte_hash/rte_cuckoo_hash.c| 24 
 lib/librte_hash/rte_hash.h   | 11 +++
 lib/librte_hash/rte_hash_version.map |  8 
 test/test/test_hash.c| 12 
 test/test/test_hash_multiwriter.c|  8 
 5 files changed, 63 insertions(+)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index 35631cc..bb67ade 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -370,6 +370,30 @@ rte_hash_secondary_hash(const hash_sig_t primary_hash)
return primary_hash ^ ((tag + 1) * alt_bits_xor);
 }
 
+int32_t
+rte_hash_count(const struct rte_hash *h)
+{
+   uint32_t tot_ring_cnt, cached_cnt = 0;
+   uint32_t i, ret;
+
+   if (h == NULL)
+   return -EINVAL;
+
+   if (h->multi_writer_support) {
+   tot_ring_cnt = h->entries + (RTE_MAX_LCORE - 1) *
+   (LCORE_CACHE_SIZE - 1);
+   for (i = 0; i < RTE_MAX_LCORE; i++)
+   cached_cnt += h->local_free_slots[i].len;
+
+   ret = tot_ring_cnt - rte_ring_count(h->free_slots) -
+   cached_cnt;
+   } else {
+   tot_ring_cnt = h->entries;
+   ret = tot_ring_cnt - rte_ring_count(h->free_slots);
+   }
+   return ret;
+}
+
 /* Read write locks implemented using rte_rwlock */
 static inline void
 __hash_rw_writer_lock(const struct rte_hash *h)
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index ecb49e4..1f1a276 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -127,6 +127,17 @@ void
 rte_hash_reset(struct rte_hash *h);
 
 /**
+ * Return the number of keys in the hash table
+ * @param h
+ *  Hash table to query from
+ * @return
+ *   - -EINVAL if parameters are invalid
+ *   - A value indicating how many keys were inserted in the table.
+ */
+int32_t
+rte_hash_count(const struct rte_hash *h);
+
+/**
  * Add a key-value pair to an existing hash table.
  * This operation is not multi-thread safe
  * and should only be called from one thread.
diff --git a/lib/librte_hash/rte_hash_version.map 
b/lib/librte_hash/rte_hash_version.map
index 52a2576..e216ac8 100644
--- a/lib/librte_hash/rte_hash_version.map
+++ b/lib/librte_hash/rte_hash_version.map
@@ -45,3 +45,11 @@ DPDK_16.07 {
rte_hash_get_key_with_position;
 
 } DPDK_2.2;
+
+
+DPDK_18.08 {
+   global:
+
+   rte_hash_count;
+
+} DPDK_16.07;
diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index edf41f5..b3db9fd 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -1103,6 +1103,7 @@ static int test_average_table_utilization(void)
unsigned i, j;
unsigned added_keys, average_keys_added = 0;
int ret;
+   unsigned int cnt;
 
printf("\n# Running test to determine average utilization"
   "\n  before adding elements begins to fail\n");
@@ -1121,13 +1122,24 @@ static int test_average_table_utilization(void)
for (i = 0; i < ut_params.key_len; i++)
simple_key[i] = rte_rand() % 255;
ret = rte_hash_add_key(handle, simple_key);
+   if (ret < 0)
+   break;
}
+
if (ret != -ENOSPC) {
printf("Unexpected error when adding keys\n");
rte_hash_free(handle);
return -1;
}
 
+   cnt = rte_hash_count(handle);
+   if (cnt != added_keys) {
+   printf("rte_hash_count returned wrong value %u, %u,"
+   "%u\n", j, added_keys, cnt);
+   rte_hash_free(handle);
+   return -1;
+   }
+
average_keys_added += added_keys;
 
/* Reset the table */
diff --git a/test/test/test_hash_multiwriter.c 
b/test/test/test_hash_multiwriter.c
index ef5fce3..f182f40 100644
--- a/test/test/test_hash_multiwriter.c
+++ b/test/test/test_hash_multiwriter.c
@@ -116,6 +116,7 @@ test_hash_multiwriter(void)
 
uint32_t duplicated_keys = 0;
uint32_t lost_keys = 0;
+   uint32_t count;
 
snprintf(name, 32, "test%u", calledCount++);
hash_params.name = name;
@@ -163,6 +164,13 @@ test_hash_multiwriter(void)
 NULL, CALL_MASTER);
rte_eal_mp_wait_lcore();
 
+   count = rte_hash_count(handle);
+   if (count != rounded_nb_total_tsx_insertion) {
+   printf("rte_hash_count ret

Re: [dpdk-dev] [dpdk-users] Traffic doesn't forward on virtual devices

2018-07-09 Thread Aaron Conole
Bala Sankaran  writes:

> Perfect!
>
> Thanks for the help.
>
> - Original Message -
>> From: "Keith Wiles" 
>> To: "Bala Sankaran" 
>> Cc: us...@dpdk.org, "Aaron Conole" 
>> Sent: Thursday, July 5, 2018 11:41:46 AM
>> Subject: Re: [dpdk-users] Traffic doesn't forward on virtual devices
>> 
>> 
>> 
>> > On Jul 5, 2018, at 9:53 AM, Bala Sankaran  wrote:
>> > 
>> > Greetings,
>> > 
>> > I am currently using dpdk version 17.11.2. I see that there are a few other
>> > revisions in 17.11.3, followed by the latest stable version of 18.02.2.
>> > 
>> > Based on the issues I have faced so far (see Original
>> > Message below), would you suggest that  I go for
>> > another version? If yes, which one? In essence, my question is, would
>> > resorting to a different version of dpdk solve my current issue of
>> > virtqueue id being invalid?
>> > 
>> > Any help is much appreciated.
>> 
>> From a support perspective using the latest version 18.05 or the long term
>> supported version 17.11.3 is easier for most to help. I would pick the
>> latest release 18.05 myself. As for fixing this problem I do not know. You
>> can look into the MAINTAINERS file and find the maintainers of area(s) and
>> include them in the CC line on your questions as sometimes they miss the
>> emails as the volume can be high at times.

Thanks Keith.

I took a quick look and it seems like the queues are not setting up
correctly between OvS and testpmd?  Probably there's a step missing
somewhere, although nothing in either the netdev-dpdk.c from OvS nor the
rte_ethdev was obvious to stand out to me.

I've CC'd Maxime, Ian, and Ciara - maybe they have a better idea to try?

>> > 
>> > Thanks
>> > 
>> > - Original Message -
>> >> From: "Bala Sankaran" 
>> >> To: us...@dpdk.org
>> >> Cc: "Aaron Conole" 
>> >> Sent: Thursday, June 28, 2018 3:18:13 PM
>> >> Subject: Traffic doesn't forward on virtual devices
>> >> 
>> >> 
>> >> Hello team,
>> >> 
>> >> I am working on a project to do PVP tests on dpdk. As a first step, I
>> >> would
>> >> like to get traffic flow between tap devices. I'm in process of setting up
>> >> the architecture, in which I've used testpmd to forward traffic between
>> >> two
>> >> virtual devices(tap and vhost users) over a bridge.
>> >> 
>> >> While I'm at it, I've identified that the internal dev_attached flag never
>> >> gets set to 1 from the rte_eth_vhost.c file. I've tried to manually set it
>> >> to 1 in the start routine, but I just see that the queue index being
>> >> referenced is out of range.
>> >> 
>> >> I'm not sure how to proceed.  Has anyone had luck using testpmd to
>> >> communicate with vhost-user devices?  If yes, any hints on a workaround?
>> >> 
>> >> Here's how I configured my setup after installing dpdk and openvswitch:
>> >> 
>> >> 1. To start ovs-ctl:
>> >> /usr/local/share/openvswitch/scripts/ovs-ctl start
>> >> 
>> >> 2. Setup hugepages:
>> >> echo '2048' > /proc/sys/vm/nr_hugepages
>> >> 
>> >> 3. Add a new network namespace:
>> >> ip netns add ns1
>> >> 
>> >> 4. Add and set a bridge:
>> >> ovs-vsctl add-br dpdkbr0 -- set Bridge dpdkbr0 datapath_type=netdev
>> >> options:vhost-server-path=/usr/local/var/run/openvswitch/vhu0
>> >> ovs-vsctl show
>> >> 
>> >> 5. Add a vhost user to the bridge created:
>> >> ovs-vsctl add-port dpdkbr0 vhu0 -- set Interface vhu0
>> >> type=dpdkvhostuserclient
>> >> 
>> >> 6. Execute bash on the network namespace:
>> >> ip netns exec ns1 bash
>> >> 
>> >> 7. Use testpmd and connect the namespaces:
>> >> testpmd --socket-mem=512
>> >> --vdev='eth_vhost0,iface=/usr/local/var/run/openvswitch/vhu0,queues=1'
>> >> --vdev='net_tap0,iface=tap0' --file-prefix page0 -- -i
>> >> 
>> >> 
>> >> I repeated steps 3 - 7 for another network namespace on the same bridge.
>> >> Following this, in fresh terminals, I assigned IP addresses to the tap
>> >> devices created and tried pinging them. From port statistics,
>> >> I identified the above mentioned issue with the dev_attached and queue
>> >> statistics.
>> >> 
>> >> I would greatly appreciate any help from your end.
>> >> 
>> >> Thanks.
>> >> 
>> >> -
>> >> Bala Sankaran
>> >> Networking Services Intern
>> >> Red Hat Inc .,
>> >> 
>> > -
>> > Bala Sankaran
>> > Networking Services Intern
>> 
>> Regards,
>> Keith
>> 
>> 
>
> --
> Bala Sankaran
> Networking Services Intern
> Red Hat Inc .,


Re: [dpdk-dev] [PATCH v4 1/2] test/crypto: add rsa and mod test application

2018-07-09 Thread De Lara Guarch, Pablo



> -Original Message-
> From: Shally Verma [mailto:shally.ve...@caviumnetworks.com]
> Sent: Thursday, July 5, 2018 4:54 PM
> To: De Lara Guarch, Pablo 
> Cc: dev@dpdk.org; pathr...@caviumnetworks.com; Sunila Sahu
> ; Ashish Gupta
> 
> Subject: [PATCH v4 1/2] test/crypto: add rsa and mod test application

Retitle to "add RSA and Mod tests"? No need to use "test application".

> 
> From: Sunila Sahu 
> 
> Test application include test case for :
> - RSA encrypt, decrypt, sign and verify
> - Modular Inversion and Exponentiation
> 
> Test cases uses predefined test vectors.
> 
> Signed-off-by: Sunila Sahu 
> Signed-off-by: Shally Verma 
> Signed-off-by: Ashish Gupta 
> ---
>  test/test/Makefile  |   1 +
>  test/test/meson.build   |   1 +
>  test/test/test_cryptodev_asym.c | 836
> 
>  test/test/test_cryptodev_asym_util.h|  45 ++
>  test/test/test_cryptodev_mod_test_vectors.h | 103 
> test/test/test_cryptodev_rsa_test_vectors.h |  90 +++
>  6 files changed, 1076 insertions(+)
> 
> diff --git a/test/test/Makefile b/test/test/Makefile index eccc8ef..d6fb88f
> 100644
> --- a/test/test/Makefile
> +++ b/test/test/Makefile
> @@ -179,6 +179,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_RING) +=
> test_pmd_ring_perf.c
> 
>  SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_blockcipher.c
>  SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev.c
> +SRCS-$(CONFIG_RTE_LIBRTE_CRYPTODEV) += test_cryptodev_asym.c
> 
>  ifeq ($(CONFIG_RTE_COMPRESSDEV_TEST),y)
>  SRCS-$(CONFIG_RTE_LIBRTE_COMPRESSDEV) += test_compressdev.c diff --git
> a/test/test/meson.build b/test/test/meson.build index a907fd2..06cd6f7 100644
> --- a/test/test/meson.build
> +++ b/test/test/meson.build
> @@ -22,6 +22,7 @@ test_sources = files('commands.c',
>   'test_cpuflags.c',
>   'test_crc.c',
>   'test_cryptodev.c',
> + 'test_cryptodev_asym.c',
>   'test_cryptodev_blockcipher.c',
>   'test_cycles.c',
>   'test_debug.c',

Add new test to test_names list in meson.build.


> diff --git a/test/test/test_cryptodev_asym.c b/test/test/test_cryptodev_asym.c
> new file mode 100644 index 000..9b6ffac
> --- /dev/null
> +++ b/test/test/test_cryptodev_asym.c
> @@ -0,0 +1,836 @@

...

> + snprintf(test_msg,
> + ASYM_TEST_MSG_LEN,
> + "Modinv :%s length:%lu\n",
> + asym_op->modinv.base.data,
> + asym_op->modinv.base.length);

There is a compilation error on 32 bits:

test/test/test_cryptodev_asym.c:1046:25: error: format '%lu' expects argument 
of type 'long unsigned int',
but argument 5 has type 'size_t {aka unsigned int}' [-Werror=format=]
"Modinv :%s length:%lu\n",
   ~~^
   %u
test/test/test_cryptodev_asym.c:1048:4:
asym_op->modinv.base.length);
~~~


Re: [dpdk-dev] [PATCH v5] net/mlx4: support hardware TSO

2018-07-09 Thread Matan Azrad
Hi Moti

I continue the discussion here in spite of the next version was out just to see 
the full discussions. 

From: Mordechay Haimovsky
> inline
> 
> > From: Matan Azrad
> > Hi Moti
> >
> > Please see some comments below.
> >
> > From: Mordechay Haimovsky
> > > Implement support for hardware TSO.
> > >
> > > Signed-off-by: Moti Haimovsky 
...
> > > + do {
> > > + /* how many dseg entries do we have in the current TXBB ?
> > > */
> > > + nb_segs_txbb = (MLX4_TXBB_SIZE -
> > > + ((uintptr_t)dseg & (MLX4_TXBB_SIZE - 1))) >>
> > > +MLX4_SEG_SHIFT;
> > > + switch (nb_segs_txbb) {
> > > + default:
> > > + /* Should never happen. */
> > > + rte_panic("%p: Invalid number of SGEs(%d) for a
> > > TXBB",
> > > + (void *)txq, nb_segs_txbb);
> > > + /* rte_panic never returns. */
> >
> > Since this default case should not happen because of the above
> > calculation I think we don't need it.
> > Just "break" if the compiler complain of default case lack.
> >
> Although "default" is not mandatory in switch case statement it is a good
> practice to have it even just for code clarity.
> so I will keep it there.

But the rte_panic code (and all the default block) is redundant and we don't 
need redundant code in our data-path.
You can remain a comment if you want for clarifying.
 

> > > + case 4:
> > > + /* Memory region key for this memory pool. */
> > > + lkey = mlx4_tx_mb2mr(txq, sbuf);
> > > + if (unlikely(lkey == (uint32_t)-1))
> > > + goto err;
> > > + dseg->addr =
> > > +
> > > rte_cpu_to_be_64(rte_pktmbuf_mtod_offset(sbuf,
> > > +  uintptr_t,
> > > +  sb_of));
> > > + dseg->lkey = lkey;
> > > + /*
> > > +  * This data segment starts at the beginning of a new
> > > +  * TXBB, so we need to postpone its byte_count
> > > writing
> > > +  * for later.
> > > +  */
> > > + pv[*pv_counter].dseg = dseg;
> > > + /*
> > > +  * Zero length segment is treated as inline segment
> > > +  * with zero data.
> > > +  */
> > > + data_len = sbuf->data_len - sb_of;
> >
> > Is there a chance that the data_len will be negative? Rolled in this case?
> Since we verify ahead the all l2,l3 and L4 headers reside in the same fragment
> there is no reason for data_len to become negative, this is why I use uint16_t
> which is  the same data type used in struct rte_mbuf for representing
> data_len , and as we do it in mlx4_tx_burst_segs.
> 
> > Maybe it is better to change it for int16_t and to replace the next
> > check to
> > be:
> > data_len > 0 ? data_len : 0x8000
> >
> I will keep this the way it is for 2 reasons:
> 1. Seems to me more cumbersome then what I wrote.

OK, you right here, if it cannot be negative we shouldn't change it :)

> 2. Code consistency wise, this is how we also wrote it in mlx4_tx_burst_segs,
>  What's good there is also good here.

Not agree, here is really a different case from there, a lot of assumption are 
different and the code may reflects it.

> > And I think I found a way to remove the sb_of calculations for each
> segment:
> >
> > Each segment will create the next segment parameters while only the
> > pre loop calculation for the first segment parameters will calculate
> > the header
> > offset:
> >
> > The parameters: data_len and sb_of.
> >
> > So before the loop:
> > sb_of = tinfo->tso_header_size;
> > data_len = sbuf->data_len - sb_of;
> >
> > And inside the loop (after the check of nb_segs):
> > sb_of = 0;
> > data_len = sbuf->data_len(the next sbuf);
> >
> > so each segment calculates the next segment parameters and we don't
> > need the "- sb_of" calculation per segment.
> >
> NICE :)
> 

Sorry for see it only now, but we don't need even the "sb_of=0" per segment:
We can add one more parameter for the next segment 
addr = rte_pktmbuf_mtod_offset(sbuf, uintptr_t, tinfo->tso_header_size)
before the loop
and
addr= rte_pktmbuf_mtod(sbuf, uintptr_t)
inside the loop

so finally we save 2 cycles per segment :)
...
> > > +static inline volatile struct mlx4_wqe_data_seg *
> > > +mlx4_tx_burst_fill_tso_hdr(struct rte_mbuf *buf,
> > > +struct txq *txq,
> > > +struct tso_info *tinfo,
> > > +volatile struct mlx4_wqe_ctrl_seg *ctrl) {
> > > + volatile struct mlx4_wqe_lso_seg *tseg =
> > > + (volatile struct mlx4_wqe_lso_seg *)(ctrl + 1);
> > > + struct mlx4_sq *sq = &txq->msq;
> > > + struct pv *pv = tinfo->pv;
> > > + int *pv_counter = &tinfo->pv_counter;
> > > + int remain_size = tinfo->tso_he

  1   2   >