Re: [dpdk-dev] [PATCH] eal: force IOVA mode to physical

2018-09-06 Thread Jerin Jacob
-Original Message-
> Date: Tue, 4 Sep 2018 23:40:36 -0400
> From: Eric Zhang 
> To: santosh , hemant.agra...@nxp.com,
>  Gaëtan Rivet , "Burakov, Anatoly"
>  
> CC: bruce.richard...@intel.com, dev@dpdk.org, allain.leg...@windriver.com,
>  matt.pet...@windriver.com
> Subject: Re: [dpdk-dev] [PATCH] eal: force IOVA mode to physical
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
>  Thunderbird/52.9.1
> 
> On 08/30/2018 08:59 AM, santosh wrote:
> > On Thursday 30 August 2018 05:43 PM, Hemant wrote:
> > > External Email
> > > 
> > > Hi,
> > > 
> > > On 8/30/2018 3:13 PM, Gaëtan Rivet wrote:
> > > > Hi,
> > > > 
> > > > On Thu, Aug 30, 2018 at 10:09:04AM +0100, Burakov, Anatoly wrote:
> > > > > On 29-Aug-18 4:58 PM, eric zhang wrote:
> > > > > > This patch adds a configuration option to force the IOVA mode to
> > > > > > physical address (PA). There exists virtual devices that are not
> > > > > > directly attached to the PCI bus, and therefore the auto detection
> > > > > > of the IOVA mode based on probing the PCI bus and IOMMU 
> > > > > > configuration
> > > > > > may not report the required addressing mode. Having the 
> > > > > > configuration
> > > > > > option permits the mode to be explicitly configured in this 
> > > > > > scenario.
> > > > > > 
> > > > > > Signed-off-by: eric zhang 
> > > > > > ---
> > > > > Defining this at compile-time seems like an overkill. Wouldn't it be 
> > > > > better
> > > > > to just add an EAL command-line option to force IOVA mode to a 
> > > > > particular
> > > > > value?
> > > That is a good suggestion.
> > > > > --
> > > > > Thanks,
> > > > > Anatoly
> > > > What is the bus of these devices and why not implement get_iommu_class
> > > > in it?
> > > There are cases, where you are using dpdk libraries with external
> > > libraries and you need to change the default behavior DPDK lib to use
> > > physical address instead of virtual address.
> > > Providing an option to user will help.
> > > 
> > > 
> > More appropriate solution could be:
> > * Either fix it at bus layer .. i.e.. get_iommu_class()..
> > * Or introduce something like [1] --iova-mode= param.
> > 
> > Former is better solution than latter if autodetection is a key criteria.
> > Thanks.
> > 
> > [1] http://patchwork.dpdk.org/patch/25192/
> > 
> It's not generic which couldn't be fixed at bus layer.
> So what's the preference of EAL option or compile time solution?
> Adding --iova-mode as patch [1] will overrivde auto-detection
> rte_bus_get_iommu_class()
> make it no use; compile time solution will align with upstream and keep
> new atuodetection
> solution in #ifndef.

If it is for vdev devices, why not introduce something like
RTE_PCI_DRV_IOVA_AS_VA and let vdev device describe its personality.
And based on the devices(flags) on vdev bus, rte_bus_get_iommu_class()
of vdev can decide the mode just like PCI bus.


> 
> Thanks
> Eric
> 


Re: [dpdk-dev] MLX5 should define the timestamp field in the doc

2018-09-06 Thread Shahaf Shuler
Wednesday, September 5, 2018 12:00 PM, Tom Barbette:
>Actually I managed this patch to implement support for 
>rte_eth_timesync_read_time.

I am not fully familiar w/ this API, but it looks like the timespec returned 
from this call is expected to be in real time values (i.e. seconds and nano 
seconds),
at least this is what I see on the ptpclient example on the DPDK tree.

>
>Please tell me potential modifications, and if I shall submit it again as a 
>"normal" patch to dev ?
>
>---

[...]

> }
>
> /**
>+ * Get device current time
>+ *
>+ * @param dev
>+ *   Pointer to Ethernet device structure.
>+ *
>+ * @param[out] time
>+ *   Time output value.
>+ *
>+ * @return
>+ *   0 if the time has correctly been set
>+ */
>+int
>+mlx5_timesync_read_time(struct rte_eth_dev *dev, struct timespec *time)
>+{
>+struct priv *priv = dev->data->dev_private;
>+struct ibv_values_ex values;
>+int err = 0;
>+
>+values.comp_mask = IBV_VALUES_MASK_RAW_CLOCK;
>+if ((err = mlx5_glue->query_rt_values_ex(priv->ctx, &values)) != 0) {

The use of this function will not bring you the outcome the API defines.
see the man page of ibv_query_rt_values_ex:
struct ibv_values_ex {
uint32_t comp_mask;/* Compatibility mask that defines 
the query/queried fields [in/out]

struct timespec  raw_clock;/* HW raw clock */
};

enum ibv_values_mask {
IBV_VALUES_MASK_RAW_CLOCK = 1 << 0, /* HW raw clock */
};

The output is the HW raw clock (just like you have in the mbuf).

In order it to work the application needs to understand the PTP coefficients 
for the raw->real time conversion. this can be done, just need some more work.
do you have a ptp daemon implemented to calc the coefficients?

>+ DRV_LOG(WARNING, "Could not query time !");
>+return err;
>+}
>+
>+*time = values.raw_clock;
>+return 0;
>+}
>+
>+
>+/**
>  * Get supported packet types.
>  *
>  * @param dev
>diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
>index c7965e5..3c72f5b 100644
>--- a/drivers/net/mlx5/mlx5_glue.c
>+++ b/drivers/net/mlx5/mlx5_glue.c
>@@ -84,6 +84,13 @@ mlx5_glue_query_device_ex(struct ibv_context *context,
> }
>
> static int
>+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
>+   struct ibv_values_ex* values)
>+{
>+ return ibv_query_rt_values_ex(context, values);
>+}
>+
>+static int
> mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
>   struct ibv_port_attr *port_attr)
> {
>@@ -354,6 +361,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct 
>mlx5_glue){
>  .close_device = mlx5_glue_close_device,
>  .query_device = mlx5_glue_query_device,
>  .query_device_ex = mlx5_glue_query_device_ex,
>+ .query_rt_values_ex = mlx5_glue_query_rt_values_ex,
>  .query_port = mlx5_glue_query_port,
>  .create_comp_channel = mlx5_glue_create_comp_channel,
>  .destroy_comp_channel = mlx5_glue_destroy_comp_channel,
>diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
>index e584d36..0582e95 100644
>--- a/drivers/net/mlx5/mlx5_glue.h
>+++ b/drivers/net/mlx5/mlx5_glue.h
>@@ -54,6 +54,8 @@ struct mlx5_glue {
>  int (*query_device_ex)(struct ibv_context *context,
> const struct ibv_query_device_ex_input *input,
> struct ibv_device_attr_ex *attr);
>+ int (*query_rt_values_ex)(struct ibv_context *context,
>+struct ibv_values_ex *values);
>  int (*query_port)(struct ibv_context *context, uint8_t port_num,
>struct ibv_port_attr *port_attr);
>  struct ibv_comp_channel *(*create_comp_channel)
>--
>2.7.4
>
>
>
>De : Shahaf Shuler 
>Envoyé : mercredi 5 septembre 2018 10:18
>À : Tom Barbette; dev@dpdk.org; Alex Rosenbaum
>Cc : Yongseok Koh; john.mcnam...@intel.com; marko.kovace...@intel.com
>Objet : RE: MLX5 should define the timestamp field in the doc
>
>Thanks for the details.
>
>The use case is clear. We will take it internally to see when we can support 
>it.
>AFAIK we cannot read the internal time from userspace.
>
>Adding also AlexR to comment
>
>From: Tom Barbette 
>Sent: Wednesday, September 5, 2018 10:11 AM
>To: Shahaf Shuler ; dev@dpdk.org
>Cc: Yongseok Koh ; john.mcnam...@intel.com; 
>marko.kovace...@intel.com
>Subject: RE: MLX5 should define the timestamp field in the doc
>
>Thanks for your answer Shahaf !
>
>We're trying to measure the latency of packets going through various service 
>chains inside individual "server".  Eg. we can see that on Server 1, the 
>latency for the service chain handling HTTP packets is ~800ns (+ max and mins, 
>tail latency, etc). What we do now is to timestamp packets right after they 
>are received, and compute the difference with the timestamp just before they 
>are sent. Over a cluster this shows us where the latency is happening.
>
>We would like this "box" latency to include the time spent in queues, and for 
>that the hardware timestamp seems fit-for-purpose as it would timestamp the 
>packets before the software queues. More

[dpdk-dev] [PATCH 1/2] eventdev: fix eth Rx adapter hotplug incompatibility

2018-09-06 Thread Nikhil Rao
Use RTE_MAX_ETHPORTS instead of rte_eth_dev_count_total()
when allocating eth Rx adapter's per-eth device data structure
to account for hotplugged devices.

Fixes: 9c38b704d280 ("eventdev: add eth Rx adapter implementation")
Cc: sta...@dpdk.org
Signed-off-by: Nikhil Rao 
---
 lib/librte_eventdev/rte_event_eth_rx_adapter.c | 5 ++---
 lib/librte_eventdev/rte_event_eth_rx_adapter.h | 4 
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.c 
b/lib/librte_eventdev/rte_event_eth_rx_adapter.c
index f5e5a0b..870ac8c 100644
--- a/lib/librte_eventdev/rte_event_eth_rx_adapter.c
+++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.c
@@ -1998,8 +1998,7 @@ static int rxa_sw_add(struct rte_event_eth_rx_adapter 
*rx_adapter,
rx_adapter->id = id;
strcpy(rx_adapter->mem_name, mem_name);
rx_adapter->eth_devices = rte_zmalloc_socket(rx_adapter->mem_name,
-   /* FIXME: incompatible with hotplug */
-   rte_eth_dev_count_total() *
+   RTE_MAX_ETHPORTS *
sizeof(struct eth_device_info), 0,
socket_id);
rte_convert_rss_key((const uint32_t *)default_rss_key,
@@ -2012,7 +2011,7 @@ static int rxa_sw_add(struct rte_event_eth_rx_adapter 
*rx_adapter,
return -ENOMEM;
}
rte_spinlock_init(&rx_adapter->rx_lock);
-   RTE_ETH_FOREACH_DEV(i)
+   for (i = 0; i < RTE_MAX_ETHPORTS; i++)
rx_adapter->eth_devices[i].dev = &rte_eth_devices[i];
 
event_eth_rx_adapter[id] = rx_adapter;
diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.h 
b/lib/librte_eventdev/rte_event_eth_rx_adapter.h
index 332ee21..863b72a 100644
--- a/lib/librte_eventdev/rte_event_eth_rx_adapter.h
+++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.h
@@ -76,10 +76,6 @@
  * rte_event_eth_rx_adapter_cb_register() function allows the
  * application to register a callback that selects which packets to enqueue
  * to the event device.
- *
- * Note:
- * 1) Devices created after an instance of rte_event_eth_rx_adapter_create
- *  should be added to a new instance of the rx adapter.
  */
 
 #ifdef __cplusplus
-- 
1.8.3.1



[dpdk-dev] [PATCH 2/2] test/eventdev: remove eth Rx adapter vdev workaround

2018-09-06 Thread Nikhil Rao
eth Rx adapter has been updated to support hotplugged
devices, devices created after adapter creation can now be
added to the adapter.

Update the adapter_multi_eth_add_del
test case to create the adapter as part of test setup
instead of creating it after creating vdevs.

Fixes: 2a9c83ae3b2e ("test/eventdev: add multi-ports test")
Cc: vipin.vargh...@intel.com
Cc: sta...@dpdk.org
Signed-off-by: Nikhil Rao 
---
 test/test/test_event_eth_rx_adapter.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/test/test/test_event_eth_rx_adapter.c 
b/test/test/test_event_eth_rx_adapter.c
index 4cca77f..d6d137e 100644
--- a/test/test/test_event_eth_rx_adapter.c
+++ b/test/test/test_event_eth_rx_adapter.c
@@ -489,9 +489,6 @@ struct event_eth_rx_adapter_test_params {
err = init_ports(rte_eth_dev_count_total());
TEST_ASSERT(err == 0, "Port initialization failed err %d\n", err);
 
-   /* creating new instance for all newly added eth devices */
-   adapter_create();
-
/* eth_rx_adapter_queue_add for n ports */
port_index = 0;
for (; port_index < rte_eth_dev_count_total(); port_index += 1) {
@@ -509,8 +506,6 @@ struct event_eth_rx_adapter_test_params {
TEST_ASSERT(err == 0, "Expected 0 got %d", err);
}
 
-   adapter_free();
-
return TEST_SUCCESS;
 }
 
@@ -675,7 +670,8 @@ struct event_eth_rx_adapter_test_params {
TEST_CASE_ST(NULL, NULL, adapter_create_free),
TEST_CASE_ST(adapter_create, adapter_free,
adapter_queue_add_del),
-   TEST_CASE_ST(NULL, NULL, adapter_multi_eth_add_del),
+   TEST_CASE_ST(adapter_create, adapter_free,
+   adapter_multi_eth_add_del),
TEST_CASE_ST(adapter_create, adapter_free, adapter_start_stop),
TEST_CASE_ST(adapter_create, adapter_free, adapter_stats),
TEST_CASES_END() /**< NULL terminate unit test array */
-- 
1.8.3.1



Re: [dpdk-dev] MLX5 should define the timestamp field in the doc

2018-09-06 Thread Tom Barbette
​It's true that it is a little bit a distortion of the original purpose. Here I 
want to query the time from the device (ie, the device's current clock). Maybe 
a new function in the API would be more suited?  CCing Thomas Mojalon for that 
part of the discussion.


I guess there is a case to query the device's timestamp to make our own precise 
time computations.


I also just saw that patch from two years ago that did not made it to the main 
branch : http://mails.dpdk.org/archives/dev/2016-October/048810.html​ , I guess 
it's because it is approximative in the time computation instead of a real 
synchronization? But now timestamp is in the rte_mbuf, so it could also 
technically go in.​


Tom



De : Shahaf Shuler 
Envoyé : jeudi 6 septembre 2018 11:07
À : Tom Barbette; dev@dpdk.org; Alex Rosenbaum
Cc : Yongseok Koh; john.mcnam...@intel.com; marko.kovace...@intel.com
Objet : RE: MLX5 should define the timestamp field in the doc

Wednesday, September 5, 2018 12:00 PM, Tom Barbette:
>Actually I managed this patch to implement support for 
>rte_eth_timesync_read_time.

I am not fully familiar w/ this API, but it looks like the timespec returned 
from this call is expected to be in real time values (i.e. seconds and nano 
seconds),
at least this is what I see on the ptpclient example on the DPDK tree.

>
>Please tell me potential modifications, and if I shall submit it again as a 
>"normal" patch to dev ?
>
>---

[...]

> }
>
> /**
>+ * Get device current time
>+ *
>+ * @param dev
>+ *   Pointer to Ethernet device structure.
>+ *
>+ * @param[out] time
>+ *   Time output value.
>+ *
>+ * @return
>+ *   0 if the time has correctly been set
>+ */
>+int
>+mlx5_timesync_read_time(struct rte_eth_dev *dev, struct timespec *time)
>+{
>+struct priv *priv = dev->data->dev_private;
>+struct ibv_values_ex values;
>+int err = 0;
>+
>+values.comp_mask = IBV_VALUES_MASK_RAW_CLOCK;
>+if ((err = mlx5_glue->query_rt_values_ex(priv->ctx, &values)) != 0) {

The use of this function will not bring you the outcome the API defines.
see the man page of ibv_query_rt_values_ex:
struct ibv_values_ex {
uint32_t comp_mask;/* Compatibility mask that defines 
the query/queried fields [in/out]

struct timespec  raw_clock;/* HW raw clock */
};

enum ibv_values_mask {
IBV_VALUES_MASK_RAW_CLOCK = 1 << 0, /* HW raw clock */
};

The output is the HW raw clock (just like you have in the mbuf).

In order it to work the application needs to understand the PTP coefficients 
for the raw->real time conversion. this can be done, just need some more work.
do you have a ptp daemon implemented to calc the coefficients?

>+ DRV_LOG(WARNING, "Could not query time !");
>+return err;
>+}
>+
>+*time = values.raw_clock;
>+return 0;
>+}
>+
>+
>+/**
>  * Get supported packet types.
>  *
>  * @param dev
>diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
>index c7965e5..3c72f5b 100644
>--- a/drivers/net/mlx5/mlx5_glue.c
>+++ b/drivers/net/mlx5/mlx5_glue.c
>@@ -84,6 +84,13 @@ mlx5_glue_query_device_ex(struct ibv_context *context,
> }
>
> static int
>+mlx5_glue_query_rt_values_ex(struct ibv_context *context,
>+   struct ibv_values_ex* values)
>+{
>+ return ibv_query_rt_values_ex(context, values);
>+}
>+
>+static int
> mlx5_glue_query_port(struct ibv_context *context, uint8_t port_num,
>   struct ibv_port_attr *port_attr)
> {
>@@ -354,6 +361,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct 
>mlx5_glue){
>  .close_device = mlx5_glue_close_device,
>  .query_device = mlx5_glue_query_device,
>  .query_device_ex = mlx5_glue_query_device_ex,
>+ .query_rt_values_ex = mlx5_glue_query_rt_values_ex,
>  .query_port = mlx5_glue_query_port,
>  .create_comp_channel = mlx5_glue_create_comp_channel,
>  .destroy_comp_channel = mlx5_glue_destroy_comp_channel,
>diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
>index e584d36..0582e95 100644
>--- a/drivers/net/mlx5/mlx5_glue.h
>+++ b/drivers/net/mlx5/mlx5_glue.h
>@@ -54,6 +54,8 @@ struct mlx5_glue {
>  int (*query_device_ex)(struct ibv_context *context,
> const struct ibv_query_device_ex_input *input,
> struct ibv_device_attr_ex *attr);
>+ int (*query_rt_values_ex)(struct ibv_context *context,
>+struct ibv_values_ex *values);
>  int (*query_port)(struct ibv_context *context, uint8_t port_num,
>struct ibv_port_attr *port_attr);
>  struct ibv_comp_channel *(*create_comp_channel)
>--
>2.7.4
>
>
>
>De : Shahaf Shuler 
>Envoyé : mercredi 5 septembre 2018 10:18
>À : Tom Barbette; dev@dpdk.org; Alex Rosenbaum
>Cc : Yongseok Koh; john.mcnam...@intel.com; marko.kovace...@intel.com
>Objet : RE: MLX5 should define the timestamp field in the doc
>
>Thanks for the details.
>
>The use case is clear. We will take it internally to see when we can support 
>it.
>AFAIK we cannot read the internal time from usersp

[dpdk-dev] [Bug 89] XL710 DPDK i40evf : Ping is not working when RSS enabled for IP

2018-09-06 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=89

Bug ID: 89
   Summary: XL710 DPDK i40evf : Ping is not working when RSS
enabled for IP
   Product: DPDK
   Version: 16.07
  Hardware: x86
OS: Linux
Status: CONFIRMED
  Severity: critical
  Priority: Normal
 Component: ethdev
  Assignee: dev@dpdk.org
  Reporter: snagi...@cisco.com
  Target Milestone: ---

HW: 2 Virtual Machines 
OS: Custom OS based on Linux 2.6.38 kernel
Application: Custom PMD application
DPDK: 16.07
NIC: XL710 40 Gig
Driver: i40evf
RSS: Enabled for IP as well.
Secondary IP address: Multiple secondary IP addresses are configured for VF.


Virtual Function is consumed by Virtual Machine.
Each Virtual Machine is running a Custom DPDK application.
Multiple IP addresses are configured for VF.
Pings are NOT working for some of the IP addresses.
The issue is random behavior.
Ping is failed with primary IP address as well at times.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[dpdk-dev] [PATCH] net/mlx5: fix representor port xstats

2018-09-06 Thread Xueming Li
This patch fixes the issue that representor port shows xstats of PF.

Fixes: 5a4b8e2612c5 ("net/mlx5: probe all port representors")

Signed-off-by: Xueming Li 
---
 drivers/net/mlx5/mlx5_stats.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
index 91f3d47..e3a1c60 100644
--- a/drivers/net/mlx5/mlx5_stats.c
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -146,7 +146,7 @@ struct mlx5_counter_ctrl {
et_stats->cmd = ETHTOOL_GSTATS;
et_stats->n_stats = xstats_ctrl->stats_n;
ifr.ifr_data = (caddr_t)et_stats;
-   ret = mlx5_ifreq(dev, SIOCETHTOOL, &ifr, 1);
+   ret = mlx5_ifreq(dev, SIOCETHTOOL, &ifr, 0);
if (ret) {
DRV_LOG(WARNING,
"port %u unable to read statistic values from device",
@@ -194,7 +194,7 @@ struct mlx5_counter_ctrl {
 
drvinfo.cmd = ETHTOOL_GDRVINFO;
ifr.ifr_data = (caddr_t)&drvinfo;
-   ret = mlx5_ifreq(dev, SIOCETHTOOL, &ifr, 1);
+   ret = mlx5_ifreq(dev, SIOCETHTOOL, &ifr, 0);
if (ret) {
DRV_LOG(WARNING, "port %u unable to query number of statistics",
dev->data->port_id);
@@ -244,7 +244,7 @@ struct mlx5_counter_ctrl {
strings->string_set = ETH_SS_STATS;
strings->len = dev_stats_n;
ifr.ifr_data = (caddr_t)strings;
-   ret = mlx5_ifreq(dev, SIOCETHTOOL, &ifr, 1);
+   ret = mlx5_ifreq(dev, SIOCETHTOOL, &ifr, 0);
if (ret) {
DRV_LOG(WARNING, "port %u unable to get statistic names",
dev->data->port_id);
-- 
1.8.3.1



Re: [dpdk-dev] [RFC] ethdev: add min/max MTU to device info

2018-09-06 Thread Stephen Hemminger
On Thu, 6 Sep 2018 09:29:32 +0300
Andrew Rybchenko  wrote:

> On 09/05/2018 07:41 PM, Stephen Hemminger wrote:
> > This addresses the usability issue raised by OVS at DPDK Userspace
> > summit. It adds general min/max mtu into device info. For compatiablity,
> > and to save space, it fits in a hole in existing structure.  
> 
> It is true for amd64, but it looks like it is false on 32-bit. So, ABI 
> breakage.

Yes it is ABI change on 32 bit, but 18.11 is a major release where
this is allowed/expected.


[dpdk-dev] [PATCH] net/i40e/base: add new TR bits used for cloud filters

2018-09-06 Thread Kirill Rybalchenko
There is a new set of TR bits that can be used when replacing
the cloud filters so add them in. Also added a check to make
sure that the replace cloud filters AQ command doesn't get
executed on an X722 since it is not supported there.

Fixes: de2cd512b176 ("net/i40e/base: new AQ commands for cloud filter")
Cc: sta...@dpdk.org

Signed-off-by: Paul M Stillwell Jr 
Signed-off-by: Andrey Chilikin 
Signed-off-by: Kirill Rybalchenko 
---
 drivers/net/i40e/base/i40e_adminq_cmd.h | 3 ++-
 drivers/net/i40e/base/i40e_common.c | 9 +
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/i40e/base/i40e_adminq_cmd.h 
b/drivers/net/i40e/base/i40e_adminq_cmd.h
index 801c0ff..8ae97f6 100644
--- a/drivers/net/i40e/base/i40e_adminq_cmd.h
+++ b/drivers/net/i40e/base/i40e_adminq_cmd.h
@@ -1501,7 +1501,8 @@ struct i40e_aqc_replace_cloud_filters_cmd {
u8  old_filter_type;
u8  new_filter_type;
u8  tr_bit;
-   u8  reserved[4];
+   u8  tr_bit2;
+   u8  reserved[3];
__le32 addr_high;
__le32 addr_low;
 };
diff --git a/drivers/net/i40e/base/i40e_common.c 
b/drivers/net/i40e/base/i40e_common.c
index e0a5be1..2c5dd2c 100644
--- a/drivers/net/i40e/base/i40e_common.c
+++ b/drivers/net/i40e/base/i40e_common.c
@@ -5916,6 +5916,14 @@ i40e_status_code i40e_aq_replace_cloud_filters(struct 
i40e_hw *hw,
enum i40e_status_code status = I40E_SUCCESS;
int i = 0;
 
+   /* X722 doesn't support this command */
+   if (hw->mac.type == I40E_MAC_X722)
+   return I40E_ERR_DEVICE_NOT_SUPPORTED;
+
+   /* need FW version greater than 6.00 */
+   if (hw->aq.fw_maj_ver < 6)
+   return I40E_NOT_SUPPORTED;
+
i40e_fill_default_direct_cmd_desc(&desc,
  i40e_aqc_opc_replace_cloud_filters);
 
@@ -5925,6 +5933,7 @@ i40e_status_code i40e_aq_replace_cloud_filters(struct 
i40e_hw *hw,
cmd->new_filter_type = filters->new_filter_type;
cmd->valid_flags = filters->valid_flags;
cmd->tr_bit = filters->tr_bit;
+   cmd->tr_bit2 = filters->tr_bit2;
 
status = i40e_asq_send_command(hw, &desc, cmd_buf,
sizeof(struct i40e_aqc_replace_cloud_filters_cmd_buf),  NULL);
-- 
2.5.5



[dpdk-dev] [PATCH] ethdev: fix missing names in Tx offload name array

2018-09-06 Thread Dekel Peled
Patch 5355f443 added two definitions of DEV_TX_OFFLOAD_xxx.
If new Tx offload capabilities are defined, they also must be mentioned
in rte_tx_offload_names in rte_ethdev.c file.

This patch adds the required lines in aray rte_tx_offload_names.

Fixes: 5355f4439e2e ("ethdev: introduce generic IP/UDP tunnel checksum and TSO")

Cc: xuemi...@mellanox.com

Signed-off-by: Dekel Peled 
---
 lib/librte_ethdev/rte_ethdev.c | 2 ++
 lib/librte_ethdev/rte_ethdev.h | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 3f8de93..5004b9f 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -156,6 +156,8 @@ struct rte_eth_xstats_name_off {
RTE_TX_OFFLOAD_BIT2STR(MULTI_SEGS),
RTE_TX_OFFLOAD_BIT2STR(MBUF_FAST_FREE),
RTE_TX_OFFLOAD_BIT2STR(SECURITY),
+   RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
+   RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index fa2812b..5456ce2 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -941,18 +941,18 @@ struct rte_eth_conf {
  *   the same mempool and has refcnt = 1.
  */
 #define DEV_TX_OFFLOAD_SECURITY 0x0002
+#define DEV_TX_OFFLOAD_UDP_TNL_TSO  0x0004
 /**
  * Device supports generic UDP tunneled packet TSO.
  * Application must set PKT_TX_TUNNEL_UDP and other mbuf fields required
  * for tunnel TSO.
  */
-#define DEV_TX_OFFLOAD_UDP_TNL_TSO  0x0004
+#define DEV_TX_OFFLOAD_IP_TNL_TSO   0x0008
 /**
  * Device supports generic IP tunneled packet TSO.
  * Application must set PKT_TX_TUNNEL_IP and other mbuf fields required
  * for tunnel TSO.
  */
-#define DEV_TX_OFFLOAD_IP_TNL_TSO   0x0008
 
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x0001
 /**< Device supports Rx queue setup after device started*/
-- 
1.8.3.1



Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries

2018-09-06 Thread Michel Machado

Hi Yipeng,

On 09/04/2018 02:55 PM, Wang, Yipeng1 wrote:

Do we need both of the state and istate struct? struct rte_hash_iterator_state  
seems not doing much.
How about we only have one "state" struct and just not expose the internals to 
the public API, similar to the
rte_hash struct or rte_member_setsum struct.
And in _init function use rte_malloc to allocate the state and add a _free 
function to free it.


   The purpose of have struct state is to enable applications to 
allocate iterator states on their execution stack or embedding iterator 
states in larger structs to avoid an extra malloc()/free().


   Do you foresee that the upcoming new underlying algorithm of hash 
tables will need to dynamically allocate iterator states?


[ ]'s
Michel Machado


Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries

2018-09-06 Thread Michel Machado

Hi Gaëtan,

On 08/31/2018 06:53 PM, Gaëtan Rivet wrote:

Hi Qiaobin,

This work seems interesting, but is difficult to follow because
the previous discussion is not referenced.

You can find a how-to there:

http://doc.dpdk.org/guides/contributing/patches.html#sending-patches

--in-reply-to is useful to check which comments were already made and
understand the work previously done on a patchset.


   Thanks for bringing this to our attention.


+/* istate stands for internal state. */


Is a name requiring a comment to explain a good name?
Maybe rte_hash_iterator_priv?


+struct rte_hash_iterator_istate {
+   const struct rte_hash *h;
+   uint32_t  next;
+   uint32_t  total_entries;
+};


   We agree that the suffix _priv is better.


You should check that your private structure does not grow beyond
the public one, using RTE_BUILD_BUG_ON(sizeof(priv) < sizeof(pub)) somewhere.


   We have overlooked the macro RTE_BUILD_BUG_ON(). We'll use it.


"rte_hash_iterator_[i]state" seems unnecessarily verbose.
The memory you are manipulating through this variable is already holding
the state of your iterator. It is useless to append "_state".

 struct rte_hash_iterator_priv *state;

is also clear and reads better.
On the other hand "h" is maybe not verbose enough. Why not "hash"?


   We'll keep the parameter name "state" and rename the variable 
"__state" to "it" as you suggest in a comment later in your email.


   About the variable "h", we are following the coding convention in 
the library. You can find plenty of examples of using "h" for a hash table.



Also, please do not align field names in a structure. It forces
future changes to either break the pattern or edit the whole structure
when someone attempts to insert a field with a name that is too long.


   Your suggestion goes against the coding style of DPDK. See section 
"1.5.5. Structure Declarations" on the page:


https://doc.dpdk.org/guides-18.08/contributing/coding_style.html


+
+int32_t
+rte_hash_iterator_init(const struct rte_hash *h,
+   struct rte_hash_iterator_state *state)
+{
+   struct rte_hash_iterator_istate *__state;


Please do not use the "__" prefix to convey that
you are using a private version of the structure.

You could use "istate" or "it", the common shorthand for
iterator handles.


   We'll do it as explained before.


  int32_t
-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, 
uint32_t *next)
+rte_hash_iterate(
+   struct rte_hash_iterator_state *state, const void **key, void **data)


Why an empty first line of parameters here?

rte_hash_iterate(struct rte_hash_iterator_state *state,
  const void **key,
  void **data)

reads better.


  Okay.


+
+/* istate stands for internal state. */
+struct rte_hash_iterator_conflict_entries_istate {


I find "conflict_entries" awkward, how about

rte_hash_dup_iterator

instead? It is shorter and conveys that you will iterate duplicate
entries.


   Yipeng Wang suggested the expression "conflict_entries" in his 
review of the first version of this patch. You find his suggestion here:


http://mails.dpdk.org/archives/dev/2018-August/109103.html

   I find the name "dup" misleading because it suggests that the 
returned entries have the same key or refer to the same object. For 
example, the file descriptor returned by dup(2) refers to the same file.



+   const struct rte_hash *h;
+   uint32_t  vnext;
+   uint32_t  primary_bidx;
+   uint32_t  secondary_bidx;
+};
+
+int32_t __rte_experimental
+rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,


rte_hash_dup_iterator_init() maybe?

Why is _with_hash mentioned here? Is it possible to initialize this kind
of iterator without a reference to compare against? That this reference
is an rte_hash is already given by the parameter list.

In any case, 49 characters for a name is too long.


   Honnappa Nagarahalli suggested adding the suffix "_with_hash" during 
his review of the second version of this patch. His argument went as 
follows: "Let me elaborate. For the API 'rte_hash_lookup', there are 
multiple variations such as 'rte_hash_lookup_with_hash', 
'rte_hash_lookup_data', 'rte_hash_lookup_with_hash_data' etc. We do not 
need to create similar variations for 
'rte_hash_iterate_conflict_entries' API right now. But the naming of the 
API should be such that these variations can be created in the future."


   You find a copy of his original message here:

https://www.mail-archive.com/dev@dpdk.org/msg109653.html


+int32_t __rte_experimental
+rte_hash_iterate_conflict_entries(
+   struct rte_hash_iterator_state *state, const void **key, void **data)


How about "rte_hash_dup_next()"?
Also, please break the parameter list instead of having an empty first
line.


   We are preserving the name convention used in DPDK (e.g. 
rte_hash_iterate()).


   We'll bre

Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries

2018-09-06 Thread Michel Machado

Hi Yipeng,

On 09/04/2018 03:51 PM, Wang, Yipeng1 wrote:

Hmm, I guess my comment is for code readability. If we don’t need the extra 
state that would be great.


   Notice that applications only see the public, opaque state. And the 
private versions are scoped to the C file where they are needed.



I think "rte_hash" is defined as an internal data structure but expose the type 
to the public header. Would this work?


   Exposing the private fields would bind the interface with the 
current implementation of the hash table. In the way we are proposing, 
one should be able to replace the underlying algorithm and not touching 
the header files that applications use. But, yes, your solution would 
enable applications to allocate iterator states as local variables as well.



I propose to malloc inside function mostly because I think it is cleaner to the 
user. But your argument is
valid. Depending on use case I think it is OK.


   I don't know how other applications will use this iterator, but we 
use it when our application is already overloaded. So avoiding an 
expensive operation like malloc() is a win.



Another comment is you put the total_entry in the state, is it for performance 
of the rte_hash_iterate?


   We are saving one integer multiplication per call of 
rte_hash_iterate(). It's not a big deal, but since there's room in the 
state variable, we thought this would be a good idea because it grows 
with the size of the table. We didn't actually measure the effect of 
this decision.



If you use it to iterate conflict entries, especially If you reuse same "state" 
struct and init it again and again for different keys,
would this slow down the performance for your specific use case?


   Notice that the field total_entry only exists for 
rte_hash_iterate(). But even if total_entry were in the state of 
rte_hash_iterate_conflict_entries(), it would still save on the 
multiplication as long as rte_hash_iterate_conflict_entries() is called 
at least twice. Calling rte_hash_iterate_conflict_entries() once evens 
out, and calling rte_hash_iterate_conflict_entries() more times adds 
further savings. As a side note. in our application, whenever an 
iterator of conflicting entries is initialized, we call 
rte_hash_iterate_conflict_entries() at least once.



Also iterate_conflic_entry may need reader lock protection.


   We are going to add the reader lock protection. Thanks.

[ ]'s
Michel Machado


Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries

2018-09-06 Thread Michel Machado

Hi Yipeng,

On 09/04/2018 04:57 PM, Wang, Yipeng1 wrote:

-Original Message-
From: Michel Machado [mailto:mic...@digirati.com.br]



Exposing the private fields would bind the interface with the
current implementation of the hash table. In the way we are proposing,
one should be able to replace the underlying algorithm and not touching
the header files that applications use. But, yes, your solution would
enable applications to allocate iterator states as local variables as well.



[Wang, Yipeng] I didn't mean to expose the private fields. But only the
Type. For example, rte_hash does not expose its private fields to users.
One can change the fields without changing API.


   The fact that struct rte_hash does not expose its private fields but 
only its type to applications means that a compiler cannot find out the 
byte length of struct rte_hash using only the header rte_hash.h. Thus, 
an application cannot allocate memory on its own (e.g. as a local 
variable) for a struct rte_hash. An application can, however, have a 
pointer to a struct rte_hash since the byte length of a pointer only 
depends on the architecture of the machine. This is the motivation 
behind having struct rte_hash_iterator_state in rte_hash.h only holding 
an array of bytes.


   There are good reasons to implement struct rte_hash as it is. For 
examples, struct rte_hash can change its byte length between versions of 
DPDK even if applications are dynamically linked to DPDK and not 
recompiled. Moreover a hash table is unlikely to be so short-lived as an 
iterator.


[ ]'s
Michel Machado


Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries

2018-09-06 Thread Michel Machado

Hi Honnappa,

On 09/02/2018 06:05 PM, Honnappa Nagarahalli wrote:

+/* istate stands for internal state. */ struct rte_hash_iterator_istate
+{
+   const struct rte_hash *h;
This can be outside of this structure. This will help keep the API definitions 
consistent with existing APIs. Please see further comments below.


   Discussed later.


+   uint32_t  next;
+   uint32_t  total_entries;
+};
This structure can be moved to rte_cuckoo_hash.h file.


   What's the purpose of moving this struct to a header file since it's 
only used in the C file rte_cuckoo_hash.c?



+int32_t
+rte_hash_iterator_init(const struct rte_hash *h,
+   struct rte_hash_iterator_state *state) {
+   struct rte_hash_iterator_istate *__state;
'__state' can be replaced by 's'.

+
+   RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
+
+   __state = (struct rte_hash_iterator_istate *)state;
+   __state->h = h;
+   __state->next = 0;
+   __state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
+
+   return 0;
+}
IMO, creating this API can be avoided if the initialization is handled in 
'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 
'if' statement) in 'rte_hash_iterate' function. It will help keep the number of 
APIs to minimal.


   Applications would have to initialize struct rte_hash_iterator_state 
*state before calling rte_hash_iterate() anyway. Why not initializing 
the fields of a state only once?



  int32_t
-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, 
uint32_t *next)
+rte_hash_iterate(
+   struct rte_hash_iterator_state *state, const void **key, void **data)

IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct 
rte_hash_iterator_state'. Instead, change the API definition as follows:
rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, 
struct rte_hash_iterator_state *state)

This will help keep the API signature consistent with existing APIs.

This is an ABI change. Please take a look at 
https://doc.dpdk.org/guides/contributing/versioning.html.


   The ABI will change in a way or another, so why not going for a 
single state instead of requiring parameters that are already needed for 
the initialization of the state?


   Thank you for the link. We'll check how to proceed with the ABI change.


  {
+   struct rte_hash_iterator_istate *__state;
'__state' can be replaced with 's'.


   Gaëtan Rivet has already pointed this out in his review of this 
version of our patch.



uint32_t bucket_idx, idx, position;
struct rte_hash_key *next_key;
  
-	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);

+   RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
+   (data == NULL)), -EINVAL);
+
+   __state = (struct rte_hash_iterator_istate *)state;
  
-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;

/* Out of bounds */
-   if (*next >= total_entries)
+   if (__state->next >= __state->total_entries)
return -ENOENT;
  
'if (__state->next == 0)' is required to avoid creating 'rte_hash_iterator_init' API.


   The argument to keep _init() is presented above in this email.


/* Calculate bucket and index of current iterator */
-   bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
-   idx = *next % RTE_HASH_BUCKET_ENTRIES;
+   bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
+   idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
  
  	/* If current position is empty, go to the next one */

-   while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
-   (*next)++;
+   while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+   __state->next++;
/* End of table */
-   if (*next == total_entries)
+   if (__state->next == __state->total_entries)
return -ENOENT;
-   bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
-   idx = *next % RTE_HASH_BUCKET_ENTRIES;
+   bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
+   idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
}
-   __hash_rw_reader_lock(h);
+   __hash_rw_reader_lock(__state->h);
/* Get position of entry in key table */
-   position = h->buckets[bucket_idx].key_idx[idx];
-   next_key = (struct rte_hash_key *) ((char *)h->key_store +
-   position * h->key_entry_size);
+   position = __state->h->buckets[bucket_idx].key_idx[idx];
+   next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
+   position * __state->h->key_entry_size);
/* Return key and data */
*key = next_key->key;
*data = next_key->pdata;
  
-	__hash_rw_reader_unlock(h);

+   __hash_rw_reader_unlock(__state->h);
  
  	/* Increment ite

[dpdk-dev] [PATCH] vhost: fix crash on unregistering in client mode

2018-09-06 Thread Qiang Zhou
when rte_vhost_driver_unregister delete the connection fd,
the fd lock will prevent the vsocket to be freed. But when 
vhost_user_msg_handler
return error, it will delete vsocket conn_list. And then the fd lock will become
invalid. So the vsocket will be freed in rte_vhost_drivere_unregister and the
vhost_user_read_cb will reconnect.

To fix this:
move delete vsocket conn after reconnect

Cc: sta...@dpdk.org

Signed-off-by: Qiang Zhou 
---
 lib/librte_vhost/socket.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index d63031747..43da1c51b 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -293,16 +293,16 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
if (vsocket->notify_ops->destroy_connection)
vsocket->notify_ops->destroy_connection(conn->vid);
 
+   if (vsocket->reconnect) {
+   create_unix_socket(vsocket);
+   vhost_user_start_client(vsocket);
+   }
+
pthread_mutex_lock(&vsocket->conn_mutex);
TAILQ_REMOVE(&vsocket->conn_list, conn, next);
pthread_mutex_unlock(&vsocket->conn_mutex);
 
free(conn);
-
-   if (vsocket->reconnect) {
-   create_unix_socket(vsocket);
-   vhost_user_start_client(vsocket);
-   }
}
 }
 
-- 
2.14.3 (Apple Git-98)



Re: [dpdk-dev] MLX5 should define the timestamp field in the doc

2018-09-06 Thread Shahaf Shuler
Thursday, September 6, 2018 12:33 PM, Tom Barbette:
Subject: RE: MLX5 should define the timestamp field in the doc
>
>It's true that it is a little bit a distortion of the original purpose. Here I 
>want to query the time from the device (ie, the device's current clock). Maybe 
>a new function in the API would be more suited?  CCing Thomas Mojalon for that 
>part of the discussion.

yes, we cannot use the current API for that.

>
>I guess there is a case to query the device's timestamp to make our own 
>precise time computations.

yes this is a valid use case. it will enable to implement ptp daemon for the 
clocks sync on top of DPDK.

>
>I also just saw that patch from two years ago that did not made it to the main 
>branch : http://mails.dpdk.org/archives/dev/2016-October/048810.html , I guess 
>it's because it is approximative in the time computation instead of a real 
>synchronization? But now timestamp is in the rte_mbuf, so it could also 
>technically go in.
>

i need to refresh my memory about this one (too long ago).

anyway, for your case there is a way to go, just need an app to sync the clocks.
i can help w/ the reviews and guidance on mlx5/ethdev if you wish to push such 
support upstream.

>Tom

From: Tom Barbette 
Sent: Thursday, September 6, 2018 12:33 PM
To: Shahaf Shuler ; dev@dpdk.org; Alex Rosenbaum 

Cc: Yongseok Koh ; john.mcnam...@intel.com; 
marko.kovace...@intel.com; Thomas Monjalon 
Subject: RE: MLX5 should define the timestamp field in the doc


​It's true that it is a little bit a distortion of the original purpose. Here I 
want to query the time from the device (ie, the device's current clock). Maybe 
a new function in the API would be more suited?  CCing Thomas Mojalon for that 
part of the discussion.



I guess there is a case to query the device's timestamp to make our own precise 
time computations.



I also just saw that patch from two years ago that did not made it to the main 
branch : 
http://mails.dpdk.org/archives/dev/2016-October/048810.html​
 , I guess it's because it is approximative in the time computation instead of 
a real synchronization? But now timestamp is in the rte_mbuf, so it could also 
technically go in.​



Tom




De : Shahaf Shuler mailto:shah...@mellanox.com>>
Envoyé : jeudi 6 septembre 2018 11:07
À : Tom Barbette; dev@dpdk.org; Alex Rosenbaum
Cc : Yongseok Koh; john.mcnam...@intel.com; 
marko.kovace...@intel.com
Objet : RE: MLX5 should define the timestamp field in the doc

Wednesday, September 5, 2018 12:00 PM, Tom Barbette:
>Actually I managed this patch to implement support for 
>rte_eth_timesync_read_time.

I am not fully familiar w/ this API, but it looks like the timespec returned 
from this call is expected to be in real time values (i.e. seconds and nano 
seconds),
at least this is what I see on the ptpclient example on the DPDK tree.

>
>Please tell me potential modifications, and if I shall submit it again as a 
>"normal" patch to dev ?
>
>---

[...]

> }
>
> /**
>+ * Get device current time
>+ *
>+ * @param dev
>+ *   Pointer to Ethernet device structure.
>+ *
>+ * @param[out] time
>+ *   Time output value.
>+ *
>+ * @return
>+ *   0 if the time has correctly been set
>+ */
>+int
>+mlx5_timesync_read_time(struct rte_eth_dev *dev, struct timespec *time)
>+{
>+struct priv *priv = dev->data->dev_private;
>+struct ibv_values_ex values;
>+int err = 0;
>+
>+values.comp_mask = IBV_VALUES_MASK_RAW_CLOCK;
>+if ((err = mlx5_glue->query_rt_values_ex(priv->ctx, &values)) != 0) {

The use of this function will not bring you the outcome the API defines.
see the man page of ibv_query_rt_values_ex:
struct ibv_values_ex {
uint32_t comp_mask;/* Compatibility mask that defines 
the query/queried fields [in/out]

struct timespec  raw_clock;/* HW raw clock */
};

enum ibv_values_mask {
IBV_VALUES_MASK_RAW_CLOCK = 1 << 0, /* HW raw clock */
};

The output is the HW raw clock (just like you have in the mbuf).

In order it to work the application needs to understand the PTP coefficients 
for the raw->real time conversion. this can be done, just need some more work.
do you have a ptp daemon implemented to calc the coefficients?

>+ DRV_LOG(WARNING, "Could not query time !");
>+return err;
>+}
>+
>+*time = values.raw_clock;
>+return 0;
>+}
>+
>+
>+/**
>  * Get supported packet types.
>  *
>  * @param dev
>diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
>index c7965e5..3c72f5b 100644
>--- a/drivers/net/mlx5/mlx

[dpdk-dev] [PATCH] net/nfp: fix mbuf flags with cksum good

2018-09-06 Thread Alejandro Lucero
If checksum offload enabled and hardware reports checksum as good,
update mbuf ol_flags with proper *_CKSUM_GOOD bits.

Fixes: b812daadad0d ("nfp: add Rx and Tx")
Cc: sta...@dpdk.org

Signed-off-by: Alejandro Lucero 
---
 drivers/net/nfp/nfp_net.c | 15 +++
 drivers/net/nfp/nfp_net_pmd.h |  2 ++
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index 6e5e305..760a66a 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -1786,21 +1786,20 @@ enum nfp_qcp_ptr {
return;
 
/* If IPv4 and IP checksum error, fail */
-   if ((rxd->rxd.flags & PCIE_DESC_RX_IP4_CSUM) &&
-   !(rxd->rxd.flags & PCIE_DESC_RX_IP4_CSUM_OK))
+   if (unlikely((rxd->rxd.flags & PCIE_DESC_RX_IP4_CSUM) &&
+   !(rxd->rxd.flags & PCIE_DESC_RX_IP4_CSUM_OK)))
mb->ol_flags |= PKT_RX_IP_CKSUM_BAD;
+   else
+   mb->ol_flags |= PKT_RX_IP_CKSUM_GOOD;
 
/* If neither UDP nor TCP return */
if (!(rxd->rxd.flags & PCIE_DESC_RX_TCP_CSUM) &&
!(rxd->rxd.flags & PCIE_DESC_RX_UDP_CSUM))
return;
 
-   if ((rxd->rxd.flags & PCIE_DESC_RX_TCP_CSUM) &&
-   !(rxd->rxd.flags & PCIE_DESC_RX_TCP_CSUM_OK))
-   mb->ol_flags |= PKT_RX_L4_CKSUM_BAD;
-
-   if ((rxd->rxd.flags & PCIE_DESC_RX_UDP_CSUM) &&
-   !(rxd->rxd.flags & PCIE_DESC_RX_UDP_CSUM_OK))
+   if (likely(rxd->rxd.flags & PCIE_DESC_RX_L4_CSUM_OK))
+   mb->ol_flags |= PKT_RX_L4_CKSUM_GOOD;
+   else
mb->ol_flags |= PKT_RX_L4_CKSUM_BAD;
 }
 
diff --git a/drivers/net/nfp/nfp_net_pmd.h b/drivers/net/nfp/nfp_net_pmd.h
index c1b044e..b01036d 100644
--- a/drivers/net/nfp/nfp_net_pmd.h
+++ b/drivers/net/nfp/nfp_net_pmd.h
@@ -293,6 +293,8 @@ struct nfp_net_txq {
 #define PCIE_DESC_RX_UDP_CSUM_OK(1 <<  1)
 #define PCIE_DESC_RX_VLAN   (1 <<  0)
 
+#define PCIE_DESC_RX_L4_CSUM_OK (PCIE_DESC_RX_TCP_CSUM_OK | \
+PCIE_DESC_RX_UDP_CSUM_OK)
 struct nfp_net_rx_desc {
union {
/* Freelist descriptor */
-- 
1.9.1



[dpdk-dev] [PATCH v2] net/pcap: physical interface MAC address support

2018-09-06 Thread Juhamatti Kuusisaari
Support for PCAP physical interface MAC with phy_mac=1 devarg.

Signed-off-by: Juhamatti Kuusisaari 
---
 doc/guides/rel_notes/release_18_11.rst |   4 +
 drivers/net/pcap/rte_eth_pcap.c| 119 +++--
 2 files changed, 118 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst 
b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f58..70966740a 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -54,6 +54,10 @@ New Features
  Also, make sure to start the actual text at the margin.
  =
 
+* **Added a devarg to use PCAP interface physical MAC address.**
+  A new devarg ``phy_mac`` was introduced to allow users to use physical
+  MAC address of the selected PCAP interface.
+
 
 API Changes
 ---
diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c
index e8810a171..1c3517eea 100644
--- a/drivers/net/pcap/rte_eth_pcap.c
+++ b/drivers/net/pcap/rte_eth_pcap.c
@@ -7,6 +7,14 @@
 #include 
 
 #include 
+#include 
+#include 
+#include 
+
+#ifdef __FreeBSD__
+#include 
+#include 
+#endif
 
 #include 
 
@@ -17,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define RTE_ETH_PCAP_SNAPSHOT_LEN 65535
 #define RTE_ETH_PCAP_SNAPLEN ETHER_MAX_JUMBO_FRAME_LEN
@@ -29,6 +38,7 @@
 #define ETH_PCAP_RX_IFACE_IN_ARG "rx_iface_in"
 #define ETH_PCAP_TX_IFACE_ARG "tx_iface"
 #define ETH_PCAP_IFACE_ARG"iface"
+#define ETH_PCAP_PHY_MAC_ARG  "phy_mac"
 
 #define ETH_PCAP_ARG_MAXLEN64
 
@@ -87,6 +97,7 @@ static const char *valid_arguments[] = {
ETH_PCAP_RX_IFACE_IN_ARG,
ETH_PCAP_TX_IFACE_ARG,
ETH_PCAP_IFACE_ARG,
+   ETH_PCAP_PHY_MAC_ARG,
NULL
 };
 
@@ -904,12 +915,79 @@ pmd_init_internals(struct rte_vdev_device *vdev,
return 0;
 }
 
+static void eth_pcap_update_mac(const char *if_name, struct rte_eth_dev 
**eth_dev,
+   const unsigned int numa_node)
+{
+   void *mac_addrs;
+   PMD_LOG(INFO, "Setting phy MAC for %s\n",
+   if_name);
+#ifndef __FreeBSD__
+   int if_fd = socket(AF_INET, SOCK_DGRAM, 0);
+   if (if_fd != -1)
+   {
+   struct ifreq ifr;
+   strlcpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name));
+   if (!ioctl(if_fd, SIOCGIFHWADDR, &ifr)) {
+   mac_addrs = rte_zmalloc_socket(NULL, ETHER_ADDR_LEN,
+   0, numa_node);
+   if(mac_addrs) {
+   (*eth_dev)->data->mac_addrs = mac_addrs;
+   rte_memcpy((*eth_dev)->data->mac_addrs,
+   ifr.ifr_addr.sa_data,
+   ETHER_ADDR_LEN);
+   }
+   }
+   close(if_fd);
+   }
+#else
+   int mib[6], len = 0;
+   char *buf = NULL;
+
+   mib[0] = CTL_NET;
+   mib[1] = AF_ROUTE;
+   mib[2] = 0;
+   mib[3] = AF_LINK;
+   mib[4] = NET_RT_IFLIST;
+   mib[5] = if_nametoindex(if_name);
+
+   if (sysctl(mib, 6, NULL, &len, NULL, 0) < 0) {
+   goto cleanup;
+   }
+   if (len > 0) {
+   struct if_msghdr*ifm;
+   struct sockaddr_dl  *sdl;
+
+   buf = rte_zmalloc_socket(NULL, len,
+   0, numa_node);
+   if (buf) {
+   if (sysctl(mib, 6, buf, &len, NULL, 0) < 0) {
+   goto cleanup;
+   }
+
+   ifm = (struct if_msghdr *)buf;
+   sdl = (struct sockaddr_dl *)(ifm + 1);
+   mac_addrs = rte_zmalloc_socket(NULL, ETHER_ADDR_LEN,
+   0, numa_node);
+   if (mac_addrs) {
+   (*eth_dev)->data->mac_addrs = mac_addrs;
+   rte_memcpy((*eth_dev)->data->mac_addrs,
+   LLADDR(sdl),
+   ETHER_ADDR_LEN);
+   }
+   }
+   }
+cleanup:
+   if (buf)
+   rte_free(buf);
+#endif
+}
+
 static int
 eth_from_pcaps_common(struct rte_vdev_device *vdev,
struct pmd_devargs *rx_queues, const unsigned int nb_rx_queues,
struct pmd_devargs *tx_queues, const unsigned int nb_tx_queues,
struct rte_kvargs *kvlist, struct pmd_internals **internals,
-   struct rte_eth_dev **eth_dev)
+   const int phy_mac, struct rte_eth_dev **eth_dev)
 {
struct rte_kvargs_pair *pair = NULL;
unsigned int k_idx;
@@ -955,6 +1033,9 @@ eth_from_pcaps_common(struct rte_vdev_device *vdev,
else
(*internals)->if_index = if_nametoindex(pair->value);
 
+   i

[dpdk-dev] [PATCH v3] net/pcap: physical interface MAC address support

2018-09-06 Thread Juhamatti Kuusisaari
Support for PCAP physical interface MAC with phy_mac=1 devarg.

Signed-off-by: Juhamatti Kuusisaari 
---
 doc/guides/rel_notes/release_18_11.rst |   4 +
 drivers/net/pcap/rte_eth_pcap.c| 117 +++--
 2 files changed, 116 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst 
b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f58..70966740a 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -54,6 +54,10 @@ New Features
  Also, make sure to start the actual text at the margin.
  =
 
+* **Added a devarg to use PCAP interface physical MAC address.**
+  A new devarg ``phy_mac`` was introduced to allow users to use physical
+  MAC address of the selected PCAP interface.
+
 
 API Changes
 ---
diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c
index e8810a171..4b932291d 100644
--- a/drivers/net/pcap/rte_eth_pcap.c
+++ b/drivers/net/pcap/rte_eth_pcap.c
@@ -7,6 +7,14 @@
 #include 
 
 #include 
+#include 
+#include 
+#include 
+
+#ifdef __FreeBSD__
+#include 
+#include 
+#endif
 
 #include 
 
@@ -17,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define RTE_ETH_PCAP_SNAPSHOT_LEN 65535
 #define RTE_ETH_PCAP_SNAPLEN ETHER_MAX_JUMBO_FRAME_LEN
@@ -29,6 +38,7 @@
 #define ETH_PCAP_RX_IFACE_IN_ARG "rx_iface_in"
 #define ETH_PCAP_TX_IFACE_ARG "tx_iface"
 #define ETH_PCAP_IFACE_ARG"iface"
+#define ETH_PCAP_PHY_MAC_ARG  "phy_mac"
 
 #define ETH_PCAP_ARG_MAXLEN64
 
@@ -87,6 +97,7 @@ static const char *valid_arguments[] = {
ETH_PCAP_RX_IFACE_IN_ARG,
ETH_PCAP_TX_IFACE_ARG,
ETH_PCAP_IFACE_ARG,
+   ETH_PCAP_PHY_MAC_ARG,
NULL
 };
 
@@ -904,12 +915,77 @@ pmd_init_internals(struct rte_vdev_device *vdev,
return 0;
 }
 
+static void eth_pcap_update_mac(const char *if_name, struct rte_eth_dev 
**eth_dev,
+   const unsigned int numa_node)
+{
+   void *mac_addrs;
+   PMD_LOG(INFO, "Setting phy MAC for %s\n",
+   if_name);
+#ifndef __FreeBSD__
+   int if_fd = socket(AF_INET, SOCK_DGRAM, 0);
+   if (if_fd != -1) {
+   struct ifreq ifr;
+   strlcpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name));
+   if (!ioctl(if_fd, SIOCGIFHWADDR, &ifr)) {
+   mac_addrs = rte_zmalloc_socket(NULL, ETHER_ADDR_LEN,
+   0, numa_node);
+   if (mac_addrs) {
+   (*eth_dev)->data->mac_addrs = mac_addrs;
+   rte_memcpy((*eth_dev)->data->mac_addrs,
+   ifr.ifr_addr.sa_data,
+   ETHER_ADDR_LEN);
+   }
+   }
+   close(if_fd);
+   }
+#else
+   int mib[6], len = 0;
+   char *buf = NULL;
+
+   mib[0] = CTL_NET;
+   mib[1] = AF_ROUTE;
+   mib[2] = 0;
+   mib[3] = AF_LINK;
+   mib[4] = NET_RT_IFLIST;
+   mib[5] = if_nametoindex(if_name);
+
+   if (sysctl(mib, 6, NULL, &len, NULL, 0) < 0)
+   goto cleanup;
+   }
+   if (len > 0) {
+   struct if_msghdr*ifm;
+   struct sockaddr_dl  *sdl;
+
+   buf = rte_zmalloc_socket(NULL, len,
+   0, numa_node);
+   if (buf) {
+   if (sysctl(mib, 6, buf, &len, NULL, 0) < 0)
+   goto cleanup;
+
+   ifm = (struct if_msghdr *)buf;
+   sdl = (struct sockaddr_dl *)(ifm + 1);
+   mac_addrs = rte_zmalloc_socket(NULL, ETHER_ADDR_LEN,
+   0, numa_node);
+   if (mac_addrs) {
+   (*eth_dev)->data->mac_addrs = mac_addrs;
+   rte_memcpy((*eth_dev)->data->mac_addrs,
+   LLADDR(sdl),
+   ETHER_ADDR_LEN);
+   }
+   }
+   }
+cleanup:
+   if (buf)
+   rte_free(buf);
+#endif
+}
+
 static int
 eth_from_pcaps_common(struct rte_vdev_device *vdev,
struct pmd_devargs *rx_queues, const unsigned int nb_rx_queues,
struct pmd_devargs *tx_queues, const unsigned int nb_tx_queues,
struct rte_kvargs *kvlist, struct pmd_internals **internals,
-   struct rte_eth_dev **eth_dev)
+   const int phy_mac, struct rte_eth_dev **eth_dev)
 {
struct rte_kvargs_pair *pair = NULL;
unsigned int k_idx;
@@ -955,6 +1031,9 @@ eth_from_pcaps_common(struct rte_vdev_device *vdev,
else
(*internals)->if_index = if_nametoindex(pair->value);
 
+   if (phy_mac && pair)
+   e

[dpdk-dev] [PATCH 01/15] net/softnic: add infrastructure for flow API

2018-09-06 Thread Reshma Pattan
Add rte_flow infra structure for flow api support.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_internals.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/softnic/rte_eth_softnic_internals.h 
b/drivers/net/softnic/rte_eth_softnic_internals.h
index a25eb874c..882cfd191 100644
--- a/drivers/net/softnic/rte_eth_softnic_internals.h
+++ b/drivers/net/softnic/rte_eth_softnic_internals.h
@@ -20,6 +20,7 @@
 
 #include 
 #include 
+#include 
 
 #include "rte_eth_softnic.h"
 #include "conn.h"
@@ -43,6 +44,13 @@ struct pmd_params {
} tm;
 };
 
+/**
+ * Ethdev Flow API
+ */
+struct rte_flow;
+
+TAILQ_HEAD(flow_list, rte_flow);
+
 /**
  * MEMPOOL
  */
@@ -762,6 +770,15 @@ struct softnic_table_rule_action {
struct rte_table_action_time_params time;
 };
 
+struct rte_flow {
+   TAILQ_ENTRY(rte_flow) node;
+   struct softnic_table_rule_match match;
+   struct softnic_table_rule_action action;
+   void *data;
+   struct pipeline *pipeline;
+   uint32_t table_id;
+};
+
 int
 softnic_pipeline_port_in_stats_read(struct pmd_internals *p,
const char *pipeline_name,
-- 
2.14.4



[dpdk-dev] [PATCH 02/15] net/softnic: rte flow attr mapping to pipeline

2018-09-06 Thread Reshma Pattan
Added mapping support from rte flow attributes
to softnic pipeline and table.

So added flow attribute map set and get functions
definition to new file rte_eth_sofnic_flow.c.

Added pmd flow internals with ingress and egress
flow attribute maps.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/Makefile|  1 +
 drivers/net/softnic/meson.build |  1 +
 drivers/net/softnic/rte_eth_softnic_flow.c  | 46 +
 drivers/net/softnic/rte_eth_softnic_internals.h | 31 +
 4 files changed, 79 insertions(+)
 create mode 100644 drivers/net/softnic/rte_eth_softnic_flow.c

diff --git a/drivers/net/softnic/Makefile b/drivers/net/softnic/Makefile
index ea9b65f4e..12515b10d 100644
--- a/drivers/net/softnic/Makefile
+++ b/drivers/net/softnic/Makefile
@@ -33,6 +33,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_SOFTNIC) += 
rte_eth_softnic_action.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_SOFTNIC) += rte_eth_softnic_pipeline.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_SOFTNIC) += rte_eth_softnic_thread.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_SOFTNIC) += rte_eth_softnic_cli.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_SOFTNIC) += rte_eth_softnic_flow.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_SOFTNIC) += parser.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_SOFTNIC) += conn.c
 
diff --git a/drivers/net/softnic/meson.build b/drivers/net/softnic/meson.build
index ff9822747..56e5e2b21 100644
--- a/drivers/net/softnic/meson.build
+++ b/drivers/net/softnic/meson.build
@@ -13,6 +13,7 @@ sources = files('rte_eth_softnic_tm.c',
'rte_eth_softnic_pipeline.c',
'rte_eth_softnic_thread.c',
'rte_eth_softnic_cli.c',
+   'rte_eth_softnic_flow.c',
'parser.c',
'conn.c')
 deps += ['pipeline', 'port', 'table', 'sched']
diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c 
b/drivers/net/softnic/rte_eth_softnic_flow.c
new file mode 100644
index 0..843db7590
--- /dev/null
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017 Intel Corporation
+ */
+
+#include "rte_eth_softnic_internals.h"
+#include "rte_eth_softnic.h"
+
+int
+flow_attr_map_set(struct pmd_internals *softnic,
+   uint32_t group_id,
+   int ingress,
+   const char *pipeline_name,
+   uint32_t table_id)
+{
+   struct pipeline *pipeline;
+   struct flow_attr_map *map;
+
+   if (group_id >= SOFTNIC_FLOW_MAX_GROUPS ||
+   pipeline_name == NULL)
+   return -1;
+
+   pipeline = softnic_pipeline_find(softnic, pipeline_name);
+   if (pipeline == NULL ||
+   table_id >= pipeline->n_tables)
+   return -1;
+
+   map = (ingress) ? &softnic->flow.ingress_map[group_id] :
+   &softnic->flow.egress_map[group_id];
+   strcpy(map->pipeline_name, pipeline_name);
+   map->table_id = table_id;
+   map->valid = 1;
+
+   return 0;
+}
+
+struct flow_attr_map *
+flow_attr_map_get(struct pmd_internals *softnic,
+   uint32_t group_id,
+   int ingress)
+{
+   if (group_id >= SOFTNIC_FLOW_MAX_GROUPS)
+   return NULL;
+
+   return (ingress) ? &softnic->flow.ingress_map[group_id] :
+   &softnic->flow.egress_map[group_id];
+}
diff --git a/drivers/net/softnic/rte_eth_softnic_internals.h 
b/drivers/net/softnic/rte_eth_softnic_internals.h
index 882cfd191..d1996c469 100644
--- a/drivers/net/softnic/rte_eth_softnic_internals.h
+++ b/drivers/net/softnic/rte_eth_softnic_internals.h
@@ -51,6 +51,21 @@ struct rte_flow;
 
 TAILQ_HEAD(flow_list, rte_flow);
 
+struct flow_attr_map {
+   char pipeline_name[NAME_SIZE];
+   uint32_t table_id;
+   int valid;
+};
+
+#ifndef SOFTNIC_FLOW_MAX_GROUPS
+#define SOFTNIC_FLOW_MAX_GROUPS64
+#endif
+
+struct flow_internals {
+   struct flow_attr_map ingress_map[SOFTNIC_FLOW_MAX_GROUPS];
+   struct flow_attr_map egress_map[SOFTNIC_FLOW_MAX_GROUPS];
+};
+
 /**
  * MEMPOOL
  */
@@ -497,6 +512,7 @@ struct pmd_internals {
struct tm_internals tm; /**< Traffic Management */
} soft;
 
+   struct flow_internals flow;
struct softnic_conn *conn;
struct softnic_mempool_list mempool_list;
struct softnic_swq_list swq_list;
@@ -510,6 +526,21 @@ struct pmd_internals {
struct softnic_thread_data thread_data[RTE_MAX_LCORE];
 };
 
+/**
+ * Ethdev Flow API
+ */
+int
+flow_attr_map_set(struct pmd_internals *softnic,
+   uint32_t group_id,
+   int ingress,
+   const char *pipeline_name,
+   uint32_t table_id);
+
+struct flow_attr_map *
+flow_attr_map_get(struct pmd_internals *softnic,
+   uint32_t group_id,
+   int ingress);
+
 /**
  * MEMPOOL
  */
-- 
2.14.4



[dpdk-dev] [PATCH 00/15] add flow API support to softnic

2018-09-06 Thread Reshma Pattan
This patch series adds the flow API support
for the softnic.

This patch set also introduce a new cli command
to provide mapping of flow group and direction
to softnic pipeline and table.

Reshma Pattan (15):
  net/softnic: add infrastructure for flow API
  net/softnic: rte flow attr mapping to pipeline
  net/softnic: add new cli for flow attribute map
  net/softnic: various data type changes
  net/softnic: add free table and find out port functions
  net/softnic: add function to get eth device from softnic
  net/softnic: flow API validate support
  net/softnic: validate and map flow rule with acl table match
  net/softnic: parse flow protocol for acl table match
  net/softnic: validate and map flow with hash table match
  net/softnic: validate and map flow action with table action
  net/softnic: add flow create API
  net/softnic: add flow destroy API
  net/softnic: add flow query API
  net/softnic: add parsing for raw flow item

 drivers/net/softnic/Makefile|1 +
 drivers/net/softnic/meson.build |1 +
 drivers/net/softnic/rte_eth_softnic.c   |   16 +
 drivers/net/softnic/rte_eth_softnic_cli.c   |  115 +-
 drivers/net/softnic/rte_eth_softnic_flow.c  | 1809 +++
 drivers/net/softnic/rte_eth_softnic_internals.h |   98 +-
 drivers/net/softnic/rte_eth_softnic_pipeline.c  |   61 +-
 7 files changed, 2071 insertions(+), 30 deletions(-)
 create mode 100644 drivers/net/softnic/rte_eth_softnic_flow.c

-- 
2.14.4



[dpdk-dev] [PATCH 03/15] net/softnic: add new cli for flow attribute map

2018-09-06 Thread Reshma Pattan
Added new cli by which user can specify to softnic
which rte flow group and direction has to mapped to
which pipeline and table.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_cli.c | 81 +++
 1 file changed, 81 insertions(+)

diff --git a/drivers/net/softnic/rte_eth_softnic_cli.c 
b/drivers/net/softnic/rte_eth_softnic_cli.c
index 0c7448cc4..8f5f82555 100644
--- a/drivers/net/softnic/rte_eth_softnic_cli.c
+++ b/drivers/net/softnic/rte_eth_softnic_cli.c
@@ -4797,6 +4797,81 @@ cmd_softnic_thread_pipeline_disable(struct pmd_internals 
*softnic,
}
 }
 
+/**
+ * flowapi map
+ *  group 
+ *  ingress | egress
+ *  pipeline 
+ *  table 
+ */
+static void
+cmd_softnic_flowapi_map(struct pmd_internals *softnic,
+   char **tokens,
+   uint32_t n_tokens,
+   char *out,
+   size_t out_size)
+{
+   char *pipeline_name;
+   uint32_t group_id, table_id;
+   int ingress, status;
+
+   if (n_tokens != 9) {
+   snprintf(out, out_size, MSG_ARG_MISMATCH, tokens[0]);
+   return;
+   }
+
+   if (strcmp(tokens[1], "map") != 0) {
+   snprintf(out, out_size, MSG_ARG_NOT_FOUND, "map");
+   return;
+   }
+
+   if (strcmp(tokens[2], "group") != 0) {
+   snprintf(out, out_size, MSG_ARG_NOT_FOUND, "group");
+   return;
+   }
+
+   if (softnic_parser_read_uint32(&group_id, tokens[3]) != 0) {
+   snprintf(out, out_size, MSG_ARG_INVALID, "group_id");
+   return;
+   }
+
+   if (strcmp(tokens[4], "ingress") == 0) {
+   ingress = 1;
+   } else if (strcmp(tokens[4], "egress") == 0) {
+   ingress = 0;
+   } else {
+   snprintf(out, out_size, MSG_ARG_NOT_FOUND, "ingress | egress");
+   return;
+   }
+
+   if (strcmp(tokens[5], "pipeline") != 0) {
+   snprintf(out, out_size, MSG_ARG_NOT_FOUND, "pipeline");
+   return;
+   }
+
+   pipeline_name = tokens[6];
+
+   if (strcmp(tokens[7], "table") != 0) {
+   snprintf(out, out_size, MSG_ARG_NOT_FOUND, "table");
+   return;
+   }
+
+   if (softnic_parser_read_uint32(&table_id, tokens[8]) != 0) {
+   snprintf(out, out_size, MSG_ARG_INVALID, "table_id");
+   return;
+   }
+
+   status = flow_attr_map_set(softnic,
+   group_id,
+   ingress,
+   pipeline_name,
+   table_id);
+   if (status) {
+   snprintf(out, out_size, MSG_CMD_FAIL, tokens[0]);
+   return;
+   }
+}
+
 void
 softnic_cli_process(char *in, char *out, size_t out_size, void *arg)
 {
@@ -5089,6 +5164,12 @@ softnic_cli_process(char *in, char *out, size_t 
out_size, void *arg)
}
}
 
+   if (strcmp(tokens[0], "flowapi") == 0) {
+   cmd_softnic_flowapi_map(softnic, tokens, n_tokens, out,
+   out_size);
+   return;
+   }
+
snprintf(out, out_size, MSG_CMD_UNKNOWN, tokens[0]);
 }
 
-- 
2.14.4



[dpdk-dev] [PATCH 04/15] net/softnic: various data type changes

2018-09-06 Thread Reshma Pattan
Change dev_name, action_profile_name and key_mask
from char* type to arary type of structures
softnic_port_in_params, softnic_port_out_params
and softnic_table_hash_params.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_cli.c   | 34 +++--
 drivers/net/softnic/rte_eth_softnic_internals.h | 18 ++---
 drivers/net/softnic/rte_eth_softnic_pipeline.c  |  4 +--
 3 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/drivers/net/softnic/rte_eth_softnic_cli.c 
b/drivers/net/softnic/rte_eth_softnic_cli.c
index 8f5f82555..dc8ccdc73 100644
--- a/drivers/net/softnic/rte_eth_softnic_cli.c
+++ b/drivers/net/softnic/rte_eth_softnic_cli.c
@@ -1697,6 +1697,8 @@ cmd_pipeline_port_in(struct pmd_internals *softnic,
uint32_t t0;
int enabled, status;
 
+   memset(&p, 0, sizeof(p));
+
if (n_tokens < 7) {
snprintf(out, out_size, MSG_ARG_MISMATCH, tokens[0]);
return;
@@ -1735,7 +1737,7 @@ cmd_pipeline_port_in(struct pmd_internals *softnic,
 
p.type = PORT_IN_RXQ;
 
-   p.dev_name = tokens[t0 + 1];
+   strcpy(p.dev_name, tokens[t0 + 1]);
 
if (strcmp(tokens[t0 + 2], "rxq") != 0) {
snprintf(out, out_size, MSG_ARG_NOT_FOUND, "rxq");
@@ -1758,7 +1760,7 @@ cmd_pipeline_port_in(struct pmd_internals *softnic,
 
p.type = PORT_IN_SWQ;
 
-   p.dev_name = tokens[t0 + 1];
+   strcpy(p.dev_name, tokens[t0 + 1]);
 
t0 += 2;
} else if (strcmp(tokens[t0], "tmgr") == 0) {
@@ -1770,7 +1772,7 @@ cmd_pipeline_port_in(struct pmd_internals *softnic,
 
p.type = PORT_IN_TMGR;
 
-   p.dev_name = tokens[t0 + 1];
+   strcpy(p.dev_name, tokens[t0 + 1]);
 
t0 += 2;
} else if (strcmp(tokens[t0], "tap") == 0) {
@@ -1782,7 +1784,7 @@ cmd_pipeline_port_in(struct pmd_internals *softnic,
 
p.type = PORT_IN_TAP;
 
-   p.dev_name = tokens[t0 + 1];
+   strcpy(p.dev_name, tokens[t0 + 1]);
 
if (strcmp(tokens[t0 + 2], "mempool") != 0) {
snprintf(out, out_size, MSG_ARG_NOT_FOUND,
@@ -1814,8 +1816,6 @@ cmd_pipeline_port_in(struct pmd_internals *softnic,
 
p.type = PORT_IN_SOURCE;
 
-   p.dev_name = NULL;
-
if (strcmp(tokens[t0 + 1], "mempool") != 0) {
snprintf(out, out_size, MSG_ARG_NOT_FOUND,
"mempool");
@@ -1851,7 +1851,6 @@ cmd_pipeline_port_in(struct pmd_internals *softnic,
return;
}
 
-   p.action_profile_name = NULL;
if (n_tokens > t0 &&
(strcmp(tokens[t0], "action") == 0)) {
if (n_tokens < t0 + 2) {
@@ -1859,7 +1858,7 @@ cmd_pipeline_port_in(struct pmd_internals *softnic,
return;
}
 
-   p.action_profile_name = tokens[t0 + 1];
+   strcpy(p.action_profile_name, tokens[t0 + 1]);
 
t0 += 2;
}
@@ -1945,7 +1944,7 @@ cmd_pipeline_port_out(struct pmd_internals *softnic,
 
p.type = PORT_OUT_TXQ;
 
-   p.dev_name = tokens[7];
+   strcpy(p.dev_name, tokens[7]);
 
if (strcmp(tokens[8], "txq") != 0) {
snprintf(out, out_size, MSG_ARG_NOT_FOUND, "txq");
@@ -1966,7 +1965,7 @@ cmd_pipeline_port_out(struct pmd_internals *softnic,
 
p.type = PORT_OUT_SWQ;
 
-   p.dev_name = tokens[7];
+   strcpy(p.dev_name, tokens[7]);
} else if (strcmp(tokens[6], "tmgr") == 0) {
if (n_tokens != 8) {
snprintf(out, out_size, MSG_ARG_MISMATCH,
@@ -1976,7 +1975,7 @@ cmd_pipeline_port_out(struct pmd_internals *softnic,
 
p.type = PORT_OUT_TMGR;
 
-   p.dev_name = tokens[7];
+   strcpy(p.dev_name, tokens[7]);
} else if (strcmp(tokens[6], "tap") == 0) {
if (n_tokens != 8) {
snprintf(out, out_size, MSG_ARG_MISMATCH,
@@ -1986,7 +1985,7 @@ cmd_pipeline_port_out(struct pmd_internals *softnic,
 
p.type = PORT_OUT_TAP;
 
-   p.dev_name = tokens[7];
+   strcpy(p.dev_name, tokens[7]);
} else if (strcmp(tokens[6], "sink") == 0) {
if ((n_tokens != 7) && (n_tokens != 11)) {
snprintf(out, out_size, MSG_ARG_MISMATCH,
@@ -1996,8 +1995,6 @@ cmd_pipeline_port_out(struct pmd_internals *softnic,
 
p.type = PORT_OUT_SINK;
 
-   p.dev_name = NULL;
-
if (n_tokens == 7) {
p.sink.file_name = NULL;
p.sink.max_n_pkts = 0;
@@ -2064,12 +2061,13 @@ cmd_pipeline_table(struct pmd_internals *softnic,
char *o

[dpdk-dev] [PATCH 05/15] net/softnic: add free table and find out port functions

2018-09-06 Thread Reshma Pattan
Added utility function to freeup the
pipeline tables.

Added utility functions to find the pipeline
output port.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_internals.h | 13 ++
 drivers/net/softnic/rte_eth_softnic_pipeline.c  | 57 +
 2 files changed, 70 insertions(+)

diff --git a/drivers/net/softnic/rte_eth_softnic_internals.h 
b/drivers/net/softnic/rte_eth_softnic_internals.h
index f40215dfe..9c587bc7d 100644
--- a/drivers/net/softnic/rte_eth_softnic_internals.h
+++ b/drivers/net/softnic/rte_eth_softnic_internals.h
@@ -415,10 +415,15 @@ struct softnic_port_in {
struct rte_port_in_action *a;
 };
 
+struct softnic_port_out {
+   struct softnic_port_out_params params;
+};
+
 struct softnic_table {
struct softnic_table_params params;
struct softnic_table_action_profile *ap;
struct rte_table_action *a;
+   struct flow_list flows;
 };
 
 struct pipeline {
@@ -426,7 +431,9 @@ struct pipeline {
char name[NAME_SIZE];
 
struct rte_pipeline *p;
+   struct pipeline_params params;
struct softnic_port_in port_in[RTE_PIPELINE_PORT_IN_MAX];
+   struct softnic_port_out port_out[RTE_PIPELINE_PORT_OUT_MAX];
struct softnic_table table[RTE_PIPELINE_TABLE_MAX];
uint32_t n_ports_in;
uint32_t n_ports_out;
@@ -725,6 +732,12 @@ softnic_pipeline_port_out_create(struct pmd_internals *p,
const char *pipeline_name,
struct softnic_port_out_params *params);
 
+int
+softnic_pipeline_port_out_find(struct pmd_internals *softnic,
+   const char *pipeline_name,
+   const char *name,
+   uint32_t *port_id);
+
 int
 softnic_pipeline_table_create(struct pmd_internals *p,
const char *pipeline_name,
diff --git a/drivers/net/softnic/rte_eth_softnic_pipeline.c 
b/drivers/net/softnic/rte_eth_softnic_pipeline.c
index dacf7bc9a..d1084ea36 100644
--- a/drivers/net/softnic/rte_eth_softnic_pipeline.c
+++ b/drivers/net/softnic/rte_eth_softnic_pipeline.c
@@ -43,17 +43,41 @@ softnic_pipeline_init(struct pmd_internals *p)
return 0;
 }
 
+static void
+softnic_pipeline_table_free(struct softnic_table *table)
+{
+   for ( ; ; ) {
+   struct rte_flow *flow;
+
+   flow = TAILQ_FIRST(&table->flows);
+   if (flow == NULL)
+   break;
+
+   TAILQ_REMOVE(&table->flows, flow, node);
+   free(flow);
+   }
+}
+
 void
 softnic_pipeline_free(struct pmd_internals *p)
 {
for ( ; ; ) {
struct pipeline *pipeline;
+   uint32_t table_id;
 
pipeline = TAILQ_FIRST(&p->pipeline_list);
if (pipeline == NULL)
break;
 
TAILQ_REMOVE(&p->pipeline_list, pipeline, node);
+
+   for (table_id = 0; table_id < pipeline->n_tables; table_id++) {
+   struct softnic_table *table =
+   &pipeline->table[table_id];
+
+   softnic_pipeline_table_free(table);
+   }
+
rte_ring_free(pipeline->msgq_req);
rte_ring_free(pipeline->msgq_rsp);
rte_pipeline_free(pipeline->p);
@@ -160,6 +184,7 @@ softnic_pipeline_create(struct pmd_internals *softnic,
/* Node fill in */
strlcpy(pipeline->name, name, sizeof(pipeline->name));
pipeline->p = p;
+   memcpy(&pipeline->params, params, sizeof(*params));
pipeline->n_ports_in = 0;
pipeline->n_ports_out = 0;
pipeline->n_tables = 0;
@@ -401,6 +426,7 @@ softnic_pipeline_port_out_create(struct pmd_internals 
*softnic,
} pp_nodrop;
 
struct pipeline *pipeline;
+   struct softnic_port_out *port_out;
uint32_t port_id;
int status;
 
@@ -542,6 +568,8 @@ softnic_pipeline_port_out_create(struct pmd_internals 
*softnic,
return -1;
 
/* Pipeline */
+   port_out = &pipeline->port_out[pipeline->n_ports_out];
+   memcpy(&port_out->params, params, sizeof(*params));
pipeline->n_ports_out++;
 
return 0;
@@ -960,7 +988,36 @@ softnic_pipeline_table_create(struct pmd_internals 
*softnic,
memcpy(&table->params, params, sizeof(*params));
table->ap = ap;
table->a = action;
+   TAILQ_INIT(&table->flows);
pipeline->n_tables++;
 
return 0;
 }
+
+int
+softnic_pipeline_port_out_find(struct pmd_internals *softnic,
+   const char *pipeline_name,
+   const char *name,
+   uint32_t *port_id)
+{
+   struct pipeline *pipeline;
+   uint32_t i;
+
+   if (softnic == NULL ||
+   pipeline_name == NULL ||
+   name == NULL ||
+   port_id == NULL)
+   return -1;
+
+   pipeline = softnic_pipeline_find(softnic, pipeline_name);
+   if (

[dpdk-dev] [PATCH 06/15] net/softnic: add function to get eth device from softnic

2018-09-06 Thread Reshma Pattan
Add utility function to get the rte_eth_dev from
a given softnic.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_internals.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/softnic/rte_eth_softnic_internals.h 
b/drivers/net/softnic/rte_eth_softnic_internals.h
index 9c587bc7d..1857ec50d 100644
--- a/drivers/net/softnic/rte_eth_softnic_internals.h
+++ b/drivers/net/softnic/rte_eth_softnic_internals.h
@@ -18,6 +18,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -537,6 +538,22 @@ struct pmd_internals {
struct softnic_thread_data thread_data[RTE_MAX_LCORE];
 };
 
+static inline struct rte_eth_dev *
+ETHDEV(struct pmd_internals *softnic)
+{
+   uint16_t port_id;
+   int status;
+
+   if (softnic == NULL)
+   return NULL;
+
+   status = rte_eth_dev_get_port_by_name(softnic->params.name, &port_id);
+   if (status)
+   return NULL;
+
+   return &rte_eth_devices[port_id];
+}
+
 /**
  * Ethdev Flow API
  */
-- 
2.14.4



[dpdk-dev] [PATCH 09/15] net/softnic: parse flow protocol for acl table match

2018-09-06 Thread Reshma Pattan
Added flow protocol parsing for IPV4/IPV6 and
TCP/UDP/SCTP for ACL table rule match.

Added below helper functions for doing the same.
port_mask_to_range()
ipv6_mask_to_depth()
ipv4_mask_to_depth()
mask_to_depth()

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_flow.c | 344 -
 1 file changed, 342 insertions(+), 2 deletions(-)

diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c 
b/drivers/net/softnic/rte_eth_softnic_flow.c
index 022d41775..d6d9893b5 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -1,10 +1,13 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2018 Intel Corporation
  */
 
 #include "rte_eth_softnic_internals.h"
 #include "rte_eth_softnic.h"
 
+#define rte_ntohs rte_be_to_cpu_16
+#define rte_ntohl rte_be_to_cpu_32
+
 int
 flow_attr_map_set(struct pmd_internals *softnic,
uint32_t group_id,
@@ -397,6 +400,113 @@ flow_item_skip_disabled_protos(const struct rte_flow_item 
**item,
((1LLU << RTE_FLOW_ITEM_TYPE_IPV4) | \
 (1LLU << RTE_FLOW_ITEM_TYPE_IPV6))
 
+static void
+flow_item_skip_void(const struct rte_flow_item **item)
+{
+   for ( ; ; (*item)++)
+   if ((*item)->type != RTE_FLOW_ITEM_TYPE_VOID)
+   return;
+}
+
+#define IP_PROTOCOL_TCP 0x06
+#define IP_PROTOCOL_UDP 0x11
+#define IP_PROTOCOL_SCTP 0x84
+
+static int
+mask_to_depth(uint64_t mask,
+   uint32_t *depth)
+{
+   uint64_t n;
+
+   if (mask == UINT64_MAX) {
+   if (depth)
+   *depth = 64;
+
+   return 0;
+   }
+
+   mask = ~mask;
+
+   if (mask & (mask + 1))
+   return -1;
+
+   n = __builtin_popcountll(mask);
+   if (depth)
+   *depth = (uint32_t)(64 - n);
+
+   return 0;
+}
+
+static int
+ipv4_mask_to_depth(uint32_t mask,
+   uint32_t *depth)
+{
+   uint32_t d;
+   int status;
+
+   status = mask_to_depth(mask | (UINT64_MAX << 32), &d);
+   if (status)
+   return status;
+
+   d -= 32;
+   if (depth)
+   *depth = d;
+
+   return 0;
+}
+
+static int
+ipv6_mask_to_depth(uint8_t *mask,
+   uint32_t *depth)
+{
+   uint64_t *m = (uint64_t *)mask;
+   uint64_t m0 = rte_be_to_cpu_64(m[0]);
+   uint64_t m1 = rte_be_to_cpu_64(m[1]);
+   uint32_t d0, d1;
+   int status;
+
+   status = mask_to_depth(m0, &d0);
+   if (status)
+   return status;
+
+   status = mask_to_depth(m1, &d1);
+   if (status)
+   return status;
+
+   if (d0 < 64 && d1)
+   return -1;
+
+   if (depth)
+   *depth = d0 + d1;
+
+   return 0;
+}
+
+static int
+port_mask_to_range(uint16_t port,
+   uint16_t port_mask,
+   uint16_t *port0,
+   uint16_t *port1)
+{
+   int status;
+   uint16_t p0, p1;
+
+   status = mask_to_depth(port_mask | (UINT64_MAX << 16), NULL);
+   if (status)
+   return -1;
+
+   p0 = port & port_mask;
+   p1 = p0 | ~port_mask;
+
+   if (port0)
+   *port0 = p0;
+
+   if (port1)
+   *port1 = p1;
+
+   return 0;
+}
+
 static int
 flow_rule_match_acl_get(struct pmd_internals *softnic __rte_unused,
struct pipeline *pipeline __rte_unused,
@@ -409,6 +519,7 @@ flow_rule_match_acl_get(struct pmd_internals *softnic 
__rte_unused,
union flow_item spec, mask;
size_t size, length = 0;
int disabled = 0, status;
+   uint8_t ip_proto, ip_proto_mask;
 
memset(rule_match, 0, sizeof(*rule_match));
rule_match->match_type = TABLE_ACL;
@@ -427,6 +538,80 @@ flow_rule_match_acl_get(struct pmd_internals *softnic 
__rte_unused,
return status;
 
switch (item->type) {
+   case RTE_FLOW_ITEM_TYPE_IPV4:
+   {
+   uint32_t sa_depth, da_depth;
+
+   status = ipv4_mask_to_depth(rte_ntohl(mask.ipv4.hdr.src_addr),
+   &sa_depth);
+   if (status)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ITEM,
+   item,
+   "ACL: Illegal IPv4 header source address mask");
+
+   status = ipv4_mask_to_depth(rte_ntohl(mask.ipv4.hdr.dst_addr),
+   &da_depth);
+   if (status)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ITEM,
+   item,
+   "ACL: Illegal IPv4 header destination address 
mask");
+
+   ip_proto = spec.ipv4.hdr.next_proto_id;
+   ip_pr

[dpdk-dev] [PATCH 07/15] net/softnic: flow API validate support

2018-09-06 Thread Reshma Pattan
Start adding flow api operations.

Started with flow validate api support by adding
below basic infrastructure.

flow_pipeline_table_get()
pmd_flow_validate()

Additional flow validate changes will be
added in next patches.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic.c   |  16 
 drivers/net/softnic/rte_eth_softnic_flow.c  | 112 
 drivers/net/softnic/rte_eth_softnic_internals.h |   2 +
 3 files changed, 130 insertions(+)

diff --git a/drivers/net/softnic/rte_eth_softnic.c 
b/drivers/net/softnic/rte_eth_softnic.c
index 30fb3952a..ae2a4385b 100644
--- a/drivers/net/softnic/rte_eth_softnic.c
+++ b/drivers/net/softnic/rte_eth_softnic.c
@@ -205,6 +205,21 @@ pmd_link_update(struct rte_eth_dev *dev __rte_unused,
return 0;
 }
 
+static int
+pmd_filter_ctrl(struct rte_eth_dev *dev __rte_unused,
+   enum rte_filter_type filter_type,
+   enum rte_filter_op filter_op,
+   void *arg)
+{
+   if (filter_type == RTE_ETH_FILTER_GENERIC &&
+   filter_op == RTE_ETH_FILTER_GET) {
+   *(const void **)arg = &pmd_flow_ops;
+   return 0;
+   }
+
+   return -ENOTSUP;
+}
+
 static int
 pmd_tm_ops_get(struct rte_eth_dev *dev __rte_unused, void *arg)
 {
@@ -222,6 +237,7 @@ static const struct eth_dev_ops pmd_ops = {
.dev_infos_get = pmd_dev_infos_get,
.rx_queue_setup = pmd_rx_queue_setup,
.tx_queue_setup = pmd_tx_queue_setup,
+   .filter_ctrl = pmd_filter_ctrl,
.tm_ops_get = pmd_tm_ops_get,
 };
 
diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c 
b/drivers/net/softnic/rte_eth_softnic_flow.c
index 843db7590..f37890333 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -44,3 +44,115 @@ flow_attr_map_get(struct pmd_internals *softnic,
return (ingress) ? &softnic->flow.ingress_map[group_id] :
&softnic->flow.egress_map[group_id];
 }
+
+static int
+flow_pipeline_table_get(struct pmd_internals *softnic,
+   const struct rte_flow_attr *attr,
+   const char **pipeline_name,
+   uint32_t *table_id,
+   struct rte_flow_error *error)
+{
+   struct flow_attr_map *map;
+
+   if (attr == NULL)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ATTR,
+   NULL,
+   "Null attr");
+
+   if (!attr->ingress && !attr->egress)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
+   attr,
+   "Ingress/egress not specified");
+
+   if (attr->ingress && attr->egress)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ATTR_INGRESS,
+   attr,
+   "Setting both ingress and egress is not 
allowed");
+
+   map = flow_attr_map_get(softnic,
+   attr->group,
+   attr->ingress);
+   if (map == NULL ||
+   map->valid == 0)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ATTR_GROUP,
+   attr,
+   "Invalid group ID");
+
+   if (pipeline_name)
+   *pipeline_name = map->pipeline_name;
+
+   if (table_id)
+   *table_id = map->table_id;
+
+   return 0;
+}
+
+static int
+pmd_flow_validate(struct rte_eth_dev *dev,
+   const struct rte_flow_attr *attr,
+   const struct rte_flow_item item[],
+   const struct rte_flow_action action[],
+   struct rte_flow_error *error)
+{
+   struct pmd_internals *softnic = dev->data->dev_private;
+   struct pipeline *pipeline;
+   const char *pipeline_name = NULL;
+   uint32_t table_id = 0;
+   int status;
+
+   /* Check input parameters. */
+   if (attr == NULL)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ATTR,
+   NULL, "Null attr");
+
+   if (item == NULL)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ITEM,
+   NULL,
+   "Null item");
+
+   if (action == NULL)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ACTION,
+  

[dpdk-dev] [PATCH 08/15] net/softnic: validate and map flow rule with acl table match

2018-09-06 Thread Reshma Pattan
Support for validating and mapping rte flow rule with
ACL table match is added.

As part of this support below utility functions
been added
flow_rule_match_get()
flow_rule_match_acl_get()
flow_item_skip_disabled_protos()
flow_item_proto_preprocess()
flow_item_is_proto()
flow_item_raw_preprocess()

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_flow.c | 386 +
 1 file changed, 386 insertions(+)

diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c 
b/drivers/net/softnic/rte_eth_softnic_flow.c
index f37890333..022d41775 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -95,6 +95,375 @@ flow_pipeline_table_get(struct pmd_internals *softnic,
return 0;
 }
 
+union flow_item {
+   uint8_t raw[TABLE_RULE_MATCH_SIZE_MAX];
+   struct rte_flow_item_eth eth;
+   struct rte_flow_item_vlan vlan;
+   struct rte_flow_item_ipv4 ipv4;
+   struct rte_flow_item_ipv6 ipv6;
+   struct rte_flow_item_icmp icmp;
+   struct rte_flow_item_udp udp;
+   struct rte_flow_item_tcp tcp;
+   struct rte_flow_item_sctp sctp;
+   struct rte_flow_item_vxlan vxlan;
+   struct rte_flow_item_e_tag e_tag;
+   struct rte_flow_item_nvgre nvgre;
+   struct rte_flow_item_mpls mpls;
+   struct rte_flow_item_gre gre;
+   struct rte_flow_item_gtp gtp;
+   struct rte_flow_item_esp esp;
+   struct rte_flow_item_geneve geneve;
+   struct rte_flow_item_vxlan_gpe vxlan_gpe;
+   struct rte_flow_item_arp_eth_ipv4 arp_eth_ipv4;
+   struct rte_flow_item_ipv6_ext ipv6_ext;
+   struct rte_flow_item_icmp6 icmp6;
+   struct rte_flow_item_icmp6_nd_ns icmp6_nd_ns;
+   struct rte_flow_item_icmp6_nd_na icmp6_nd_na;
+   struct rte_flow_item_icmp6_nd_opt icmp6_nd_opt;
+   struct rte_flow_item_icmp6_nd_opt_sla_eth icmp6_nd_opt_sla_eth;
+   struct rte_flow_item_icmp6_nd_opt_tla_eth icmp6_nd_opt_tla_eth;
+};
+
+static const union flow_item flow_item_raw_mask;
+
+static int
+flow_item_is_proto(enum rte_flow_item_type type,
+   const void **mask,
+   size_t *size)
+{
+   switch (type) {
+   case RTE_FLOW_ITEM_TYPE_RAW:
+   *mask = &flow_item_raw_mask;
+   *size = sizeof(flow_item_raw_mask);
+   return 1; /* TRUE */
+
+   case RTE_FLOW_ITEM_TYPE_ETH:
+   *mask = &rte_flow_item_eth_mask;
+   *size = sizeof(struct rte_flow_item_eth);
+   return 1; /* TRUE */
+
+   case RTE_FLOW_ITEM_TYPE_VLAN:
+   *mask = &rte_flow_item_vlan_mask;
+   *size = sizeof(struct rte_flow_item_vlan);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_IPV4:
+   *mask = &rte_flow_item_ipv4_mask;
+   *size = sizeof(struct rte_flow_item_ipv4);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_IPV6:
+   *mask = &rte_flow_item_ipv6_mask;
+   *size = sizeof(struct rte_flow_item_ipv6);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_ICMP:
+   *mask = &rte_flow_item_icmp_mask;
+   *size = sizeof(struct rte_flow_item_icmp);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_UDP:
+   *mask = &rte_flow_item_udp_mask;
+   *size = sizeof(struct rte_flow_item_udp);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_TCP:
+   *mask = &rte_flow_item_tcp_mask;
+   *size = sizeof(struct rte_flow_item_tcp);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_SCTP:
+   *mask = &rte_flow_item_sctp_mask;
+   *size = sizeof(struct rte_flow_item_sctp);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_VXLAN:
+   *mask = &rte_flow_item_vxlan_mask;
+   *size = sizeof(struct rte_flow_item_vxlan);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_E_TAG:
+   *mask = &rte_flow_item_e_tag_mask;
+   *size = sizeof(struct rte_flow_item_e_tag);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_NVGRE:
+   *mask = &rte_flow_item_nvgre_mask;
+   *size = sizeof(struct rte_flow_item_nvgre);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_MPLS:
+   *mask = &rte_flow_item_mpls_mask;
+   *size = sizeof(struct rte_flow_item_mpls);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_GRE:
+   *mask = &rte_flow_item_gre_mask;
+   *size = sizeof(struct rte_flow_item_gre);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_GTP:
+   case RTE_FLOW_ITEM_TYPE_GTPC:
+   case RTE_FLOW_ITEM_TYPE_GTPU:
+   *mask = &rte_flow_item_gtp_mask;
+   *size = sizeof(struct rte_flow_item_gtp);
+   return 1;
+
+   case RTE_FLOW_ITEM_TYPE_ESP:
+   *mask = &rte_flow_it

[dpdk-dev] [PATCH 10/15] net/softnic: validate and map flow with hash table match

2018-09-06 Thread Reshma Pattan
Support for validating and mapping flow rule with HASH
table match is added.

As part of this, below helper functions are added.
flow_rule_match_hash_get()
hash_key_mask_is_same()

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_flow.c | 201 -
 1 file changed, 200 insertions(+), 1 deletion(-)

diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c 
b/drivers/net/softnic/rte_eth_softnic_flow.c
index d6d9893b5..788397c1d 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -776,7 +776,195 @@ flow_rule_match_acl_get(struct pmd_internals *softnic 
__rte_unused,
return 0;
 }
 
-   static int
+/***
+ * Both *tmask* and *fmask* are byte arrays of size *tsize* and *fsize*
+ * respectively.
+ * They are located within a larger buffer at offsets *toffset* and *foffset*
+ * respectivelly. Both *tmask* and *fmask* represent bitmasks for the larger
+ * buffer.
+ * Question: are the two masks equivalent?
+ *
+ * Notes:
+ * 1. Offset basically indicates that the first offset bytes in the buffer
+ *are "don't care", so offset is equivalent to pre-pending an "all-zeros"
+ *array of *offset* bytes to the *mask*.
+ * 2. Each *mask* might contain a number of zero bytes at the beginning or
+ *at the end.
+ * 3. Bytes in the larger buffer after the end of the *mask* are also 
considered
+ *"don't care", so they are equivalent to appending an "all-zeros" array of
+ *bytes to the *mask*.
+ *
+ * Example:
+ * Buffer = [xx xx xx xx xx xx xx xx], buffer size = 8 bytes
+ * tmask = [00 22 00 33 00], toffset = 2, tsize = 5
+ *=> buffer mask = [00 00 00 22 00 33 00 00]
+ * fmask = [22 00 33], foffset = 3, fsize = 3 =>
+ *=> buffer mask = [00 00 00 22 00 33 00 00]
+ * Therefore, the tmask and fmask from this example are equivalent.
+ */
+static int
+hash_key_mask_is_same(uint8_t *tmask,
+   size_t toffset,
+   size_t tsize,
+   uint8_t *fmask,
+   size_t foffset,
+   size_t fsize,
+   size_t *toffset_plus,
+   size_t *foffset_plus)
+{
+   size_t tpos; /* Position of first non-zero byte in the tmask buffer. */
+   size_t fpos; /* Position of first non-zero byte in the fmask buffer. */
+
+   /* Compute tpos and fpos. */
+   for (tpos = 0; tmask[tpos] == 0; tpos++)
+   ;
+   for (fpos = 0; fmask[fpos] == 0; fpos++)
+   ;
+
+   if (toffset + tpos != foffset + fpos)
+   return 0; /* FALSE */
+
+   tsize -= tpos;
+   fsize -= fpos;
+
+   if (tsize < fsize) {
+   size_t i;
+
+   for (i = 0; i < tsize; i++)
+   if (tmask[tpos + i] != fmask[fpos + i])
+   return 0; /* FALSE */
+
+   for ( ; i < fsize; i++)
+   if (fmask[fpos + i])
+   return 0; /* FALSE */
+   } else {
+   size_t i;
+
+   for (i = 0; i < fsize; i++)
+   if (tmask[tpos + i] != fmask[fpos + i])
+   return 0; /* FALSE */
+
+   for ( ; i < tsize; i++)
+   if (tmask[tpos + i])
+   return 0; /* FALSE */
+   }
+
+   if (toffset_plus)
+   *toffset_plus = tpos;
+
+   if (foffset_plus)
+   *foffset_plus = fpos;
+
+   return 1; /* TRUE */
+}
+
+static int
+flow_rule_match_hash_get(struct pmd_internals *softnic __rte_unused,
+   struct pipeline *pipeline __rte_unused,
+   struct softnic_table *table,
+   const struct rte_flow_attr *attr __rte_unused,
+   const struct rte_flow_item *item,
+   struct softnic_table_rule_match *rule_match,
+   struct rte_flow_error *error)
+{
+   struct softnic_table_rule_match_hash key, key_mask;
+   struct softnic_table_hash_params *params = &table->params.match.hash;
+   size_t offset = 0, length = 0, tpos, fpos;
+   int status;
+
+   memset(&key, 0, sizeof(key));
+   memset(&key_mask, 0, sizeof(key_mask));
+
+   /* VOID or disabled protos only, if any. */
+   status = flow_item_skip_disabled_protos(&item, 0, &offset, error);
+   if (status)
+   return status;
+
+   if (item->type == RTE_FLOW_ITEM_TYPE_END)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ITEM,
+   item,
+   "HASH: END detected too early");
+
+   /* VOID or any protocols (enabled or disabled). */
+   for ( ; item->type != RTE_FLOW_ITEM_TYPE_END; item++) {
+   union flow_item spec, mask;
+   size_t size;
+   int disabled, status;
+
+   if (item->type == RTE_FLOW_ITEM_TYPE_VOID)
+   continue;
+
+   status = flow_item_proto_preprocess(item,

[dpdk-dev] [PATCH 11/15] net/softnic: validate and map flow action with table action

2018-09-06 Thread Reshma Pattan
Added validation and mapping of flow rule action
with table action profile.

Added flow_rule_action_get() to do the same.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_flow.c | 350 +
 1 file changed, 350 insertions(+)

diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c 
b/drivers/net/softnic/rte_eth_softnic_flow.c
index 788397c1d..351d34524 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -994,6 +994,8 @@ flow_rule_match_get(struct pmd_internals *softnic,
rule_match,
error);
 
+   /* FALLTHROUGH */
+
default:
return rte_flow_error_set(error,
ENOTSUP,
@@ -1003,6 +1005,341 @@ flow_rule_match_get(struct pmd_internals *softnic,
}
 }
 
+static int
+flow_rule_action_get(struct pmd_internals *softnic,
+   struct pipeline *pipeline,
+   struct softnic_table *table,
+   const struct rte_flow_attr *attr,
+   const struct rte_flow_action *action,
+   struct softnic_table_rule_action *rule_action,
+   struct rte_flow_error *error __rte_unused)
+{
+   struct softnic_table_action_profile *profile;
+   struct softnic_table_action_profile_params *params;
+   int n_jump_queue_rss_drop = 0;
+   int n_count = 0;
+
+   profile = softnic_table_action_profile_find(softnic,
+   table->params.action_profile_name);
+   if (profile == NULL)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+   action,
+   "JUMP: Table action profile");
+
+   params = &profile->params;
+
+   for ( ; action->type != RTE_FLOW_ACTION_TYPE_END; action++) {
+   if (action->type == RTE_FLOW_ACTION_TYPE_VOID)
+   continue;
+
+   switch (action->type) {
+   case RTE_FLOW_ACTION_TYPE_JUMP:
+   {
+   const struct rte_flow_action_jump *conf = action->conf;
+   struct flow_attr_map *map;
+
+   if (conf == NULL)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ACTION,
+   action,
+   "JUMP: Null configuration");
+
+   if (n_jump_queue_rss_drop)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ACTION,
+   action,
+   "Only one termination action is"
+   " allowed per flow");
+
+   if ((params->action_mask &
+   (1LLU << RTE_TABLE_ACTION_FWD)) == 0)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+   NULL,
+   "JUMP action not enabled for this 
table");
+
+   n_jump_queue_rss_drop = 1;
+
+   map = flow_attr_map_get(softnic,
+   conf->group,
+   attr->ingress);
+   if (map == NULL || map->valid == 0)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+   NULL,
+   "JUMP: Invalid group mapping");
+
+   if (strcmp(pipeline->name, map->pipeline_name) != 0)
+   return rte_flow_error_set(error,
+   ENOTSUP,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+   NULL,
+   "JUMP: Jump to table in different 
pipeline");
+
+   /* RTE_TABLE_ACTION_FWD */
+   rule_action->fwd.action = RTE_PIPELINE_ACTION_TABLE;
+   rule_action->fwd.id = map->table_id;
+   rule_action->action_mask |= 1 << RTE_TABLE_ACTION_FWD;
+   break;
+   } /* RTE_FLOW_ACTION_TYPE_JUMP */
+
+   case RTE_FLOW_ACTION_TYPE_QUEUE:
+   {
+   char name[NAME_SIZE];
+   struct rte_eth_dev *dev;
+   const struct rte_flow_action_queue *conf = action->conf;
+ 

[dpdk-dev] [PATCH 14/15] net/softnic: add flow query API

2018-09-06 Thread Reshma Pattan
Added pmd_flow_query() API, for flow query
support.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_flow.c | 55 ++
 1 file changed, 55 insertions(+)

diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c 
b/drivers/net/softnic/rte_eth_softnic_flow.c
index 3f8531139..da235ff7f 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -1639,8 +1639,63 @@ pmd_flow_destroy(struct rte_eth_dev *dev,
return 0;
 }
 
+static int
+pmd_flow_query(struct rte_eth_dev *dev __rte_unused,
+   struct rte_flow *flow,
+   const struct rte_flow_action *action __rte_unused,
+   void *data,
+   struct rte_flow_error *error)
+{
+   struct rte_table_action_stats_counters stats;
+   struct softnic_table *table;
+   struct rte_flow_query_count *flow_stats = data;
+   int status;
+
+   /* Check input parameters. */
+   if (flow == NULL)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_HANDLE,
+   NULL,
+   "Null flow");
+
+   if (data == NULL)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+   NULL,
+   "Null data");
+
+   table = &flow->pipeline->table[flow->table_id];
+
+   /* Rule stats read. */
+   status = rte_table_action_stats_read(table->a,
+   flow->data,
+   &stats,
+   flow_stats->reset);
+   if (status)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+   NULL,
+   "Pipeline table rule stats read failed");
+
+   /* Fill in flow stats. */
+   flow_stats->hits_set =
+   (table->ap->params.stats.n_packets_enabled) ? 1 : 0;
+   flow_stats->bytes_set =
+   (table->ap->params.stats.n_bytes_enabled) ? 1 : 0;
+   flow_stats->hits = stats.n_packets;
+   flow_stats->bytes = stats.n_bytes;
+
+   return 0;
+}
+
 const struct rte_flow_ops pmd_flow_ops = {
.validate = pmd_flow_validate,
.create = pmd_flow_create,
.destroy = pmd_flow_destroy,
+   .flush = NULL,
+   .query = pmd_flow_query,
+   .isolate = NULL,
 };
-- 
2.14.4



[dpdk-dev] [PATCH 12/15] net/softnic: add flow create API

2018-09-06 Thread Reshma Pattan
pmd_flow_create API is added to support
rte flow create.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_flow.c | 174 +
 1 file changed, 174 insertions(+)

diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c 
b/drivers/net/softnic/rte_eth_softnic_flow.c
index 351d34524..034bca047 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -1,13 +1,39 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2018 Intel Corporation
  */
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 #include "rte_eth_softnic_internals.h"
 #include "rte_eth_softnic.h"
 
+#define rte_htons rte_cpu_to_be_16
+#define rte_htonl rte_cpu_to_be_32
+
 #define rte_ntohs rte_be_to_cpu_16
 #define rte_ntohl rte_be_to_cpu_32
 
+static struct rte_flow *
+softnic_flow_find(struct softnic_table *table,
+   struct softnic_table_rule_match *rule_match)
+{
+   struct rte_flow *flow;
+
+   TAILQ_FOREACH(flow, &table->flows, node)
+   if (memcmp(&flow->match, rule_match, sizeof(*rule_match)) == 0)
+   return flow;
+
+   return NULL;
+}
+
 int
 flow_attr_map_set(struct pmd_internals *softnic,
uint32_t group_id,
@@ -1428,6 +1454,154 @@ pmd_flow_validate(struct rte_eth_dev *dev,
return 0;
 }
 
+static struct rte_flow *
+pmd_flow_create(struct rte_eth_dev *dev,
+   const struct rte_flow_attr *attr,
+   const struct rte_flow_item item[],
+   const struct rte_flow_action action[],
+   struct rte_flow_error *error)
+{
+   struct softnic_table_rule_match rule_match;
+   struct softnic_table_rule_action rule_action;
+   void *rule_data;
+
+   struct pmd_internals *softnic = dev->data->dev_private;
+   struct pipeline *pipeline;
+   struct softnic_table *table;
+   struct rte_flow *flow;
+   const char *pipeline_name = NULL;
+   uint32_t table_id = 0;
+   int new_flow, status;
+
+   /* Check input parameters. */
+   if (attr == NULL) {
+   rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ATTR,
+   NULL,
+   "Null attr");
+   return NULL;
+   }
+
+   if (item == NULL) {
+   rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ITEM,
+   NULL,
+   "Null item");
+   return NULL;
+   }
+
+   if (action == NULL) {
+   rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_ACTION,
+   NULL,
+   "Null action");
+   return NULL;
+   }
+
+   /* Identify the pipeline table to add this flow to. */
+   status = flow_pipeline_table_get(softnic, attr, &pipeline_name,
+   &table_id, error);
+   if (status)
+   return NULL;
+
+   pipeline = softnic_pipeline_find(softnic, pipeline_name);
+   if (pipeline == NULL) {
+   rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+   NULL,
+   "Invalid pipeline name");
+   return NULL;
+   }
+
+   if (table_id >= pipeline->n_tables) {
+   rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+   NULL,
+   "Invalid pipeline table ID");
+   return NULL;
+   }
+
+   table = &pipeline->table[table_id];
+
+   /* Rule match. */
+   memset(&rule_match, 0, sizeof(rule_match));
+   status = flow_rule_match_get(softnic,
+   pipeline,
+   table,
+   attr,
+   item,
+   &rule_match,
+   error);
+   if (status)
+   return NULL;
+
+   /* Rule action. */
+   memset(&rule_action, 0, sizeof(rule_action));
+   status = flow_rule_action_get(softnic,
+   pipeline,
+   table,
+   attr,
+   action,
+   &rule_action,
+   error);
+   if (status)
+   return NULL;
+
+   /* Flow find/allocate. */
+   new_flow = 0;
+   flow = softnic_flow_find(table, &rule_match);
+   if (flow == NULL) {
+   new_flow = 1;
+   flow = calloc(1, sizeof(struct rte_flow));
+   if (flow == NULL) {
+   rte_flow_error_set(error,
+   ENOMEM,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+   NULL,
+   "No

[dpdk-dev] [PATCH 15/15] net/softnic: add parsing for raw flow item

2018-09-06 Thread Reshma Pattan
Added support for parsing raw flow item.
flow_item_raw_preprocess() is added for the same.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_flow.c | 108 +
 1 file changed, 108 insertions(+)

diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c 
b/drivers/net/softnic/rte_eth_softnic_flow.c
index da235ff7f..656200445 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -297,6 +297,106 @@ flow_item_is_proto(enum rte_flow_item_type type,
}
 }
 
+static int
+flow_item_raw_preprocess(const struct rte_flow_item *item,
+   union flow_item *item_spec,
+   union flow_item *item_mask,
+   size_t *item_size,
+   int *item_disabled,
+   struct rte_flow_error *error)
+{
+   const struct rte_flow_item_raw *item_raw_spec = item->spec;
+   const struct rte_flow_item_raw *item_raw_mask = item->mask;
+   const uint8_t *pattern;
+   const uint8_t *pattern_mask;
+   uint8_t *spec = (uint8_t *)item_spec;
+   uint8_t *mask = (uint8_t *)item_mask;
+   size_t pattern_length, pattern_offset, i;
+   int disabled;
+
+   if (!item->spec)
+   return rte_flow_error_set(error,
+   ENOTSUP,
+   RTE_FLOW_ERROR_TYPE_ITEM,
+   item,
+   "RAW: Null specification");
+
+   if (item->last)
+   return rte_flow_error_set(error,
+   ENOTSUP,
+   RTE_FLOW_ERROR_TYPE_ITEM,
+   item,
+   "RAW: Range not allowed (last must be NULL)");
+
+   if (item_raw_spec->relative == 0)
+   return rte_flow_error_set(error,
+   ENOTSUP,
+   RTE_FLOW_ERROR_TYPE_ITEM,
+   item,
+   "RAW: Absolute offset not supported");
+
+   if (item_raw_spec->search)
+   return rte_flow_error_set(error,
+   ENOTSUP,
+   RTE_FLOW_ERROR_TYPE_ITEM,
+   item,
+   "RAW: Search not supported");
+
+   if (item_raw_spec->offset < 0)
+   return rte_flow_error_set(error,
+   ENOTSUP, RTE_FLOW_ERROR_TYPE_ITEM,
+   item,
+   "RAW: Negative offset not supported");
+
+   if (item_raw_spec->length == 0)
+   return rte_flow_error_set(error,
+   ENOTSUP,
+   RTE_FLOW_ERROR_TYPE_ITEM,
+   item,
+   "RAW: Zero pattern length");
+
+   if (item_raw_spec->offset + item_raw_spec->length >
+   TABLE_RULE_MATCH_SIZE_MAX)
+   return rte_flow_error_set(error,
+   ENOTSUP,
+   RTE_FLOW_ERROR_TYPE_ITEM,
+   item,
+   "RAW: Item too big");
+
+   if (!item_raw_spec->pattern && item_raw_mask && item_raw_mask->pattern)
+   return rte_flow_error_set(error,
+   ENOTSUP,
+   RTE_FLOW_ERROR_TYPE_ITEM,
+   item,
+   "RAW: Non-NULL pattern mask not allowed with NULL 
pattern");
+
+   pattern = item_raw_spec->pattern;
+   pattern_mask = (item_raw_mask) ? item_raw_mask->pattern : NULL;
+   pattern_length = (size_t)item_raw_spec->length;
+   pattern_offset = (size_t)item_raw_spec->offset;
+
+   disabled = 0;
+   if (pattern_mask == NULL)
+   disabled = 1;
+   else
+   for (i = 0; i < pattern_length; i++)
+   if ((pattern)[i])
+   disabled = 1;
+
+   memset(spec, 0, TABLE_RULE_MATCH_SIZE_MAX);
+   if (pattern)
+   memcpy(&spec[pattern_offset], pattern, pattern_length);
+
+   memset(mask, 0, TABLE_RULE_MATCH_SIZE_MAX);
+   if (pattern_mask)
+   memcpy(&mask[pattern_offset], pattern_mask, pattern_length);
+
+   *item_size = pattern_offset + pattern_length;
+   *item_disabled = disabled;
+
+   return 0;
+}
+
 static int
 flow_item_proto_preprocess(const struct rte_flow_item *item,
union flow_item *item_spec,
@@ -317,6 +417,14 @@ flow_item_proto_preprocess(const struct rte_flow_item 
*item,
item,
"Item type not supported");
 
+   if (item->type == RTE_FLOW_ITEM_TYPE_RAW)
+   return flow_item_raw_preprocess(item,
+   item_spec,
+   item_mask,
+   item_size,
+   item_disabled,
+   error);
+
/* spec */
if (!item->spec) {
/* If spec is NULL, then last and mask also have to be NULL. */
-- 
2.14.4



[dpdk-dev] [PATCH 13/15] net/softnic: add flow destroy API

2018-09-06 Thread Reshma Pattan
pmd_flow_destroy() API is added to destroy the
created flow.

Signed-off-by: Cristian Dumitrescu 
Signed-off-by: Reshma Pattan 
---
 drivers/net/softnic/rte_eth_softnic_flow.c | 39 ++
 1 file changed, 39 insertions(+)

diff --git a/drivers/net/softnic/rte_eth_softnic_flow.c 
b/drivers/net/softnic/rte_eth_softnic_flow.c
index 034bca047..3f8531139 100644
--- a/drivers/net/softnic/rte_eth_softnic_flow.c
+++ b/drivers/net/softnic/rte_eth_softnic_flow.c
@@ -1601,7 +1601,46 @@ pmd_flow_create(struct rte_eth_dev *dev,
return flow;
 }
 
+static int
+pmd_flow_destroy(struct rte_eth_dev *dev,
+   struct rte_flow *flow,
+   struct rte_flow_error *error)
+{
+   struct pmd_internals *softnic = dev->data->dev_private;
+   struct softnic_table *table;
+   int status;
+
+   /* Check input parameters. */
+   if (flow == NULL)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_HANDLE,
+   NULL,
+   "Null flow");
+
+   table = &flow->pipeline->table[flow->table_id];
+
+   /* Rule delete. */
+   status = softnic_pipeline_table_rule_delete(softnic,
+   flow->pipeline->name,
+   flow->table_id,
+   &flow->match);
+   if (status)
+   return rte_flow_error_set(error,
+   EINVAL,
+   RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
+   NULL,
+   "Pipeline table rule delete failed");
+
+   /* Flow delete. */
+   TAILQ_REMOVE(&table->flows, flow, node);
+   free(flow);
+
+   return 0;
+}
+
 const struct rte_flow_ops pmd_flow_ops = {
.validate = pmd_flow_validate,
.create = pmd_flow_create,
+   .destroy = pmd_flow_destroy,
 };
-- 
2.14.4



[dpdk-dev] [PATCH v4] net/pcap: physical interface MAC address support

2018-09-06 Thread Juhamatti Kuusisaari
Support for PCAP physical interface MAC with phy_mac=1 devarg.

Signed-off-by: Juhamatti Kuusisaari 
---
 doc/guides/rel_notes/release_18_11.rst |   4 +
 drivers/net/pcap/rte_eth_pcap.c| 118 +++--
 2 files changed, 117 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst 
b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f58..70966740a 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -54,6 +54,10 @@ New Features
  Also, make sure to start the actual text at the margin.
  =
 
+* **Added a devarg to use PCAP interface physical MAC address.**
+  A new devarg ``phy_mac`` was introduced to allow users to use physical
+  MAC address of the selected PCAP interface.
+
 
 API Changes
 ---
diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c
index e8810a171..d83976628 100644
--- a/drivers/net/pcap/rte_eth_pcap.c
+++ b/drivers/net/pcap/rte_eth_pcap.c
@@ -7,6 +7,14 @@
 #include 
 
 #include 
+#include 
+#include 
+#include 
+
+#ifdef __FreeBSD__
+#include 
+#include 
+#endif
 
 #include 
 
@@ -17,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define RTE_ETH_PCAP_SNAPSHOT_LEN 65535
 #define RTE_ETH_PCAP_SNAPLEN ETHER_MAX_JUMBO_FRAME_LEN
@@ -29,6 +38,7 @@
 #define ETH_PCAP_RX_IFACE_IN_ARG "rx_iface_in"
 #define ETH_PCAP_TX_IFACE_ARG "tx_iface"
 #define ETH_PCAP_IFACE_ARG"iface"
+#define ETH_PCAP_PHY_MAC_ARG  "phy_mac"
 
 #define ETH_PCAP_ARG_MAXLEN64
 
@@ -87,6 +97,7 @@ static const char *valid_arguments[] = {
ETH_PCAP_RX_IFACE_IN_ARG,
ETH_PCAP_TX_IFACE_ARG,
ETH_PCAP_IFACE_ARG,
+   ETH_PCAP_PHY_MAC_ARG,
NULL
 };
 
@@ -904,12 +915,78 @@ pmd_init_internals(struct rte_vdev_device *vdev,
return 0;
 }
 
+static void eth_pcap_update_mac(const char *if_name, struct rte_eth_dev 
**eth_dev,
+   const unsigned int numa_node)
+{
+   void *mac_addrs;
+   PMD_LOG(INFO, "Setting phy MAC for %s\n",
+   if_name);
+#ifndef __FreeBSD__
+   int if_fd = socket(AF_INET, SOCK_DGRAM, 0);
+   if (if_fd != -1) {
+   struct ifreq ifr;
+   strlcpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name));
+   if (!ioctl(if_fd, SIOCGIFHWADDR, &ifr)) {
+   mac_addrs = rte_zmalloc_socket(NULL, ETHER_ADDR_LEN,
+   0, numa_node);
+   if (mac_addrs) {
+   (*eth_dev)->data->mac_addrs = mac_addrs;
+   rte_memcpy((*eth_dev)->data->mac_addrs,
+   ifr.ifr_addr.sa_data,
+   ETHER_ADDR_LEN);
+   }
+   }
+   close(if_fd);
+   }
+#else
+   int mib[6], size_t len = 0;
+   char *buf = NULL;
+
+   mib[0] = CTL_NET;
+   mib[1] = AF_ROUTE;
+   mib[2] = 0;
+   mib[3] = AF_LINK;
+   mib[4] = NET_RT_IFLIST;
+   mib[5] = if_nametoindex(if_name);
+
+   if (sysctl(mib, 6, NULL, &len, NULL, 0) < 0)
+   return;
+
+   if (len > 0) {
+   struct if_msghdr*ifm;
+   struct sockaddr_dl  *sdl;
+
+   buf = rte_zmalloc_socket(NULL, len,
+   0, numa_node);
+   if (buf) {
+   if (sysctl(mib, 6, buf, &len, NULL, 0) < 0) {
+   rte_free(buf);
+   return;
+   }
+
+   ifm = (struct if_msghdr *)buf;
+   sdl = (struct sockaddr_dl *)(ifm + 1);
+   mac_addrs = rte_zmalloc_socket(NULL, ETHER_ADDR_LEN,
+   0, numa_node);
+   if (mac_addrs) {
+   (*eth_dev)->data->mac_addrs = mac_addrs;
+   rte_memcpy((*eth_dev)->data->mac_addrs,
+   LLADDR(sdl),
+   ETHER_ADDR_LEN);
+   }
+   }
+   }
+   if (buf)
+   rte_free(buf);
+#endif
+}
+
 static int
 eth_from_pcaps_common(struct rte_vdev_device *vdev,
struct pmd_devargs *rx_queues, const unsigned int nb_rx_queues,
struct pmd_devargs *tx_queues, const unsigned int nb_tx_queues,
struct rte_kvargs *kvlist, struct pmd_internals **internals,
-   struct rte_eth_dev **eth_dev)
+   const int phy_mac, struct rte_eth_dev **eth_dev)
 {
struct rte_kvargs_pair *pair = NULL;
unsigned int k_idx;
@@ -955,6 +1032,9 @@ eth_from_pcaps_common(struct rte_vdev_device *vdev,
else
(*internals)->if_index = if_nametoindex(pair->valu

[dpdk-dev] [PATCH v5] net/pcap: physical interface MAC address support

2018-09-06 Thread Juhamatti Kuusisaari
Support for PCAP physical interface MAC with phy_mac=1 devarg.

Signed-off-by: Juhamatti Kuusisaari 
---
 doc/guides/rel_notes/release_18_11.rst |   4 +
 drivers/net/pcap/rte_eth_pcap.c| 119 +++--
 2 files changed, 118 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst 
b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f58..70966740a 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -54,6 +54,10 @@ New Features
  Also, make sure to start the actual text at the margin.
  =
 
+* **Added a devarg to use PCAP interface physical MAC address.**
+  A new devarg ``phy_mac`` was introduced to allow users to use physical
+  MAC address of the selected PCAP interface.
+
 
 API Changes
 ---
diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c
index e8810a171..8917c4c4d 100644
--- a/drivers/net/pcap/rte_eth_pcap.c
+++ b/drivers/net/pcap/rte_eth_pcap.c
@@ -7,6 +7,14 @@
 #include 
 
 #include 
+#include 
+#include 
+#include 
+
+#ifdef __FreeBSD__
+#include 
+#include 
+#endif
 
 #include 
 
@@ -17,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define RTE_ETH_PCAP_SNAPSHOT_LEN 65535
 #define RTE_ETH_PCAP_SNAPLEN ETHER_MAX_JUMBO_FRAME_LEN
@@ -29,6 +38,7 @@
 #define ETH_PCAP_RX_IFACE_IN_ARG "rx_iface_in"
 #define ETH_PCAP_TX_IFACE_ARG "tx_iface"
 #define ETH_PCAP_IFACE_ARG"iface"
+#define ETH_PCAP_PHY_MAC_ARG  "phy_mac"
 
 #define ETH_PCAP_ARG_MAXLEN64
 
@@ -87,6 +97,7 @@ static const char *valid_arguments[] = {
ETH_PCAP_RX_IFACE_IN_ARG,
ETH_PCAP_TX_IFACE_ARG,
ETH_PCAP_IFACE_ARG,
+   ETH_PCAP_PHY_MAC_ARG,
NULL
 };
 
@@ -904,12 +915,79 @@ pmd_init_internals(struct rte_vdev_device *vdev,
return 0;
 }
 
+static void eth_pcap_update_mac(const char *if_name, struct rte_eth_dev 
**eth_dev,
+   const unsigned int numa_node)
+{
+   void *mac_addrs;
+   PMD_LOG(INFO, "Setting phy MAC for %s\n",
+   if_name);
+#ifndef __FreeBSD__
+   int if_fd = socket(AF_INET, SOCK_DGRAM, 0);
+   if (if_fd != -1) {
+   struct ifreq ifr;
+   strlcpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name));
+   if (!ioctl(if_fd, SIOCGIFHWADDR, &ifr)) {
+   mac_addrs = rte_zmalloc_socket(NULL, ETHER_ADDR_LEN,
+   0, numa_node);
+   if (mac_addrs) {
+   (*eth_dev)->data->mac_addrs = mac_addrs;
+   rte_memcpy((*eth_dev)->data->mac_addrs,
+   ifr.ifr_addr.sa_data,
+   ETHER_ADDR_LEN);
+   }
+   }
+   close(if_fd);
+   }
+#else
+   int mib[6];
+   size_t len = 0;
+   char *buf = NULL;
+
+   mib[0] = CTL_NET;
+   mib[1] = AF_ROUTE;
+   mib[2] = 0;
+   mib[3] = AF_LINK;
+   mib[4] = NET_RT_IFLIST;
+   mib[5] = if_nametoindex(if_name);
+
+   if (sysctl(mib, 6, NULL, &len, NULL, 0) < 0)
+   return;
+
+   if (len > 0) {
+   struct if_msghdr*ifm;
+   struct sockaddr_dl  *sdl;
+
+   buf = rte_zmalloc_socket(NULL, len,
+   0, numa_node);
+   if (buf) {
+   if (sysctl(mib, 6, buf, &len, NULL, 0) < 0) {
+   rte_free(buf);
+   return;
+   }
+
+   ifm = (struct if_msghdr *)buf;
+   sdl = (struct sockaddr_dl *)(ifm + 1);
+   mac_addrs = rte_zmalloc_socket(NULL, ETHER_ADDR_LEN,
+   0, numa_node);
+   if (mac_addrs) {
+   (*eth_dev)->data->mac_addrs = mac_addrs;
+   rte_memcpy((*eth_dev)->data->mac_addrs,
+   LLADDR(sdl),
+   ETHER_ADDR_LEN);
+   }
+   }
+   }
+   if (buf)
+   rte_free(buf);
+#endif
+}
+
 static int
 eth_from_pcaps_common(struct rte_vdev_device *vdev,
struct pmd_devargs *rx_queues, const unsigned int nb_rx_queues,
struct pmd_devargs *tx_queues, const unsigned int nb_tx_queues,
struct rte_kvargs *kvlist, struct pmd_internals **internals,
-   struct rte_eth_dev **eth_dev)
+   const int phy_mac, struct rte_eth_dev **eth_dev)
 {
struct rte_kvargs_pair *pair = NULL;
unsigned int k_idx;
@@ -955,6 +1033,9 @@ eth_from_pcaps_common(struct rte_vdev_device *vdev,
else
(*internals)->if_index = if_nametoindex(pa

[dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash

2018-09-06 Thread Honnappa Nagarahalli
Currently, reader-writer concurrency problems in rte_hash are
addressed using reader-writer locks. Use of reader-writer locks
results in following issues:

1) In many of the use cases for the hash table, writer threads
   are running on control plane. If the writer is preempted while
   holding the lock, it will block the readers for an extended period
   resulting in packet drops. This problem seems to apply for platforms
   with transactional memory support as well because of the algorithm
   used for rte_rwlock_write_lock_tm:

   static inline void
   rte_rwlock_write_lock_tm(rte_rwlock_t *rwl)
   {
if (likely(rte_try_tm(&rwl->cnt)))
return;
rte_rwlock_write_lock(rwl);
   }

   i.e. there is a posibility of using rte_rwlock_write_lock in
   failure cases.
2) Reader-writer lock based solution does not address the following
   issue.
   rte_hash_lookup_xxx APIs return the index of the element in
   the key store. Application(reader) can use that index to reference
   other data structures in its scope. Because of this, the
   index should not be freed till the application completes
   using the index.
3) Since writer blocks all the readers, the hash lookup
   rate comes down significantly when there is activity on the writer.
   This happens even for unrelated entries. Performance numbers
   given below clearly indicate this.

Lock-free solution is required to solve these problems. This patch
series adds the lock-free capabilities in the following steps:

1) Correct the alignment for the key store entry to prep for
   memory ordering.
2) Add memory ordering to prevent race conditions when a new key
   is added to the table.

3) Reader-writer concurrency issue, caused by moving the keys
   to their alternate locations during key insert, is solved
   by introducing an atomic global counter indicating a change
   in table.

4) This solution also has to solve the issue of readers using
   key store element even after the key is deleted from
   control plane.
   To solve this issue, the hash_del_key_xxx APIs do not free
   the key store element. The key store element has to be freed
   using the newly introduced rte_hash_free_key_with_position API.
   It needs to be called once all the readers have stopped using
   the key store element. How this is determined is outside
   the scope of this patch (RCU is one such mechanism that the
   application can use).

4) Finally, a lock free reader-writer concurrency flag is added
   to enable this feature at run time.

Performance numbers:
Scenario: Equal number of writer/reader threads for concurrent
  writers and readers. For readers only test, the
  entries are added upfront.

Current code:
Cores   Lookup Lookup
with add
2   474246
4   935579
6   1387   1048
8   1766   1480
10  2119   1951
12  2546   2441

With this patch:
Cores   Lookup Lookup
with add
2   291211
4   297196
6   304198
8   309202
10  315205
12  319209

Honnappa Nagarahalli (4):
  hash: correct key store element alignment
  hash: add memory ordering to avoid race conditions
  hash: fix rw concurrency while moving keys
  hash: enable lock-free reader-writer concurrency

 lib/librte_hash/rte_cuckoo_hash.c| 445 +--
 lib/librte_hash/rte_cuckoo_hash.h|   6 +-
 lib/librte_hash/rte_hash.h   |  63 -
 lib/librte_hash/rte_hash_version.map |   7 +
 4 files changed, 393 insertions(+), 128 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH 1/4] hash: correct key store element alignment

2018-09-06 Thread Honnappa Nagarahalli
Correct the key store array element alignment. This is required to
make 'pdata' in 'struct rte_hash_key' align on the correct boundary.

Signed-off-by: Honnappa Nagarahalli 
Reviewed-by: Gavin Hu 
Reviewed-by: Ola Liljedahl 
Reviewed-by: Steve Capper 
---
 lib/librte_hash/rte_cuckoo_hash.c | 4 +++-
 lib/librte_hash/rte_cuckoo_hash.h | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8..33acfc9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -189,7 +189,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
goto err_unlock;
}
 
-   const uint32_t key_entry_size = sizeof(struct rte_hash_key) + 
params->key_len;
+   const uint32_t key_entry_size =
+   RTE_ALIGN(sizeof(struct rte_hash_key) + params->key_len,
+ KEY_ALIGNMENT);
const uint64_t key_tbl_size = (uint64_t) key_entry_size * num_key_slots;
 
k = rte_zmalloc_socket(NULL, key_tbl_size,
diff --git a/lib/librte_hash/rte_cuckoo_hash.h 
b/lib/librte_hash/rte_cuckoo_hash.h
index b43f467..b0c7ef9 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -125,7 +125,7 @@ struct rte_hash_key {
};
/* Variable key size */
char key[0];
-} __attribute__((aligned(KEY_ALIGNMENT)));
+};
 
 /* All different signature compare functions */
 enum rte_hash_sig_compare_function {
-- 
2.7.4



[dpdk-dev] [PATCH 2/4] hash: add memory ordering to avoid race conditions

2018-09-06 Thread Honnappa Nagarahalli
Only race condition that can occur is -  using the key store element
before the key write is completed. Hence, while inserting the element
the release memory order is used. Any other race condition is caught
by the key comparison. Memory orderings are added only where needed.
For ex: reads in the writer's context do not need memory ordering
as there is a single writer.

key_idx in the bucket entry and pdata in the key store element are
used for synchronisation. key_idx is used to release an inserted
entry in the bucket to the reader. Use of pdata for synchronisation
is required due to updation of an existing entry where-in only
the pdata is updated without updating key_idx.

Signed-off-by: Honnappa Nagarahalli 
Reviewed-by: Gavin Hu 
Reviewed-by: Ola Liljedahl 
Reviewed-by: Steve Capper 
---
 lib/librte_hash/rte_cuckoo_hash.c | 111 --
 1 file changed, 82 insertions(+), 29 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index 33acfc9..2d89158 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -485,7 +485,9 @@ enqueue_slot_back(const struct rte_hash *h,
rte_ring_sp_enqueue(h->free_slots, slot_id);
 }
 
-/* Search a key from bucket and update its data */
+/* Search a key from bucket and update its data.
+ * Writer holds the lock before calling this.
+ */
 static inline int32_t
 search_and_update(const struct rte_hash *h, void *data, const void *key,
struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
@@ -499,8 +501,13 @@ search_and_update(const struct rte_hash *h, void *data, 
const void *key,
k = (struct rte_hash_key *) ((char *)keys +
bkt->key_idx[i] * h->key_entry_size);
if (rte_hash_cmp_eq(key, k->key, h) == 0) {
-   /* Update data */
-   k->pdata = data;
+   /* 'pdata' acts as the synchronization point
+* when an existing hash entry is updated.
+* Key is not updated in this case.
+*/
+   __atomic_store_n(&k->pdata,
+   data,
+   __ATOMIC_RELEASE);
/*
 * Return index where key is stored,
 * subtracting the first dummy index
@@ -554,7 +561,15 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
if (likely(prim_bkt->key_idx[i] == EMPTY_SLOT)) {
prim_bkt->sig_current[i] = sig;
prim_bkt->sig_alt[i] = alt_hash;
-   prim_bkt->key_idx[i] = new_idx;
+   /* Key can be of arbitrary length, so it is
+* not possible to store it atomically.
+* Hence the new key element's memory stores
+* (key as well as data) should be complete
+* before it is referenced.
+*/
+   __atomic_store_n(&prim_bkt->key_idx[i],
+new_idx,
+__ATOMIC_RELEASE);
break;
}
}
@@ -637,8 +652,10 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 prev_bkt->sig_current[prev_slot];
curr_bkt->sig_current[curr_slot] =
prev_bkt->sig_alt[prev_slot];
-   curr_bkt->key_idx[curr_slot] =
-   prev_bkt->key_idx[prev_slot];
+   /* Release the updated bucket entry */
+   __atomic_store_n(&curr_bkt->key_idx[curr_slot],
+   prev_bkt->key_idx[prev_slot],
+   __ATOMIC_RELEASE);
 
curr_slot = prev_slot;
curr_node = prev_node;
@@ -647,7 +664,10 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
 
curr_bkt->sig_current[curr_slot] = sig;
curr_bkt->sig_alt[curr_slot] = alt_hash;
-   curr_bkt->key_idx[curr_slot] = new_idx;
+   /* Release the new bucket entry */
+   __atomic_store_n(&curr_bkt->key_idx[curr_slot],
+new_idx,
+__ATOMIC_RELEASE);
 
__hash_rw_writer_unlock(h);
 
@@ -778,8 +798,15 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, 
const void *key,
new_idx = (uint32_t)((uintptr_t) slot_id);
/* Copy key */
rte_memcpy(new_k->key, key, h->key_len);
-   new_k->pdata = data;
-
+   /* Key can be of arbitrary length, so it is not possible to store
+* it atomically. Hence the new key element's memory stores
+* (key as well as data) should be complete 

[dpdk-dev] [PATCH 4/4] hash: enable lock-free reader-writer concurrency

2018-09-06 Thread Honnappa Nagarahalli
Add the flag to enable reader-writer concurrency during
run time. The rte_hash_del_xxx APIs do not free the keystore
element when this flag is enabled. Hence a new API,
rte_hash_free_key_with_position, to free the key store element
is added.

Signed-off-by: Honnappa Nagarahalli 
Reviewed-by: Gavin Hu 
Reviewed-by: Ola Liljedahl 
Reviewed-by: Steve Capper 
---
 lib/librte_hash/rte_cuckoo_hash.c| 105 ++-
 lib/librte_hash/rte_cuckoo_hash.h|   2 +
 lib/librte_hash/rte_hash.h   |  55 ++
 lib/librte_hash/rte_hash_version.map |   7 +++
 4 files changed, 142 insertions(+), 27 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index 1e4a8d4..bf51a73 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -93,6 +93,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
unsigned i;
unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
unsigned int readwrite_concur_support = 0;
+   unsigned int readwrite_concur_lf_support = 0;
 
rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
 
@@ -124,6 +125,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
multi_writer_support = 1;
}
 
+   if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF)
+   readwrite_concur_lf_support = 1;
+
/* Store all keys and leave the first entry as a dummy entry for 
lookup_bulk */
if (multi_writer_support)
/*
@@ -272,6 +276,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->hw_trans_mem_support = hw_trans_mem_support;
h->multi_writer_support = multi_writer_support;
h->readwrite_concur_support = readwrite_concur_support;
+   h->readwrite_concur_lf_support = readwrite_concur_lf_support;
 
 #if defined(RTE_ARCH_X86)
if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
@@ -647,19 +652,21 @@ rte_hash_cuckoo_move_insert_mw(struct rte_hash *h,
return -1;
}
 
-   /* Inform the previous move. The current move need
-* not be informed now as the current bucket entry
-* is present in both primary and secondary.
-* Since there is one writer, load acquires on
-* tbl_chng_cnt are not required.
-*/
-   __atomic_store_n(&h->tbl_chng_cnt,
-h->tbl_chng_cnt + 1,
-__ATOMIC_RELEASE);
-   /* The stores to sig_alt and sig_current should not
-* move above the store to tbl_chng_cnt.
-*/
-   __atomic_thread_fence(__ATOMIC_RELEASE);
+   if (h->readwrite_concur_lf_support) {
+   /* Inform the previous move. The current move need
+* not be informed now as the current bucket entry
+* is present in both primary and secondary.
+* Since there is one writer, load acquires on
+* tbl_chng_cnt are not required.
+*/
+   __atomic_store_n(&h->tbl_chng_cnt,
+h->tbl_chng_cnt + 1,
+__ATOMIC_RELEASE);
+   /* The stores to sig_alt and sig_current should not
+* move above the store to tbl_chng_cnt.
+*/
+   __atomic_thread_fence(__ATOMIC_RELEASE);
+   }
 
/* Need to swap current/alt sig to allow later
 * Cuckoo insert to move elements back to its
@@ -679,19 +686,21 @@ rte_hash_cuckoo_move_insert_mw(struct rte_hash *h,
curr_bkt = curr_node->bkt;
}
 
-   /* Inform the previous move. The current move need
-* not be informed now as the current bucket entry
-* is present in both primary and secondary.
-* Since there is one writer, load acquires on
-* tbl_chng_cnt are not required.
-*/
-   __atomic_store_n(&h->tbl_chng_cnt,
-h->tbl_chng_cnt + 1,
-__ATOMIC_RELEASE);
-   /* The stores to sig_alt and sig_current should not
-* move above the store to tbl_chng_cnt.
-*/
-   __atomic_thread_fence(__ATOMIC_RELEASE);
+   if (h->readwrite_concur_lf_support) {
+   /* Inform the previous move. The current move need
+* not be informed now as the current bucket entry
+* is present in both primary and secondary.
+* Since there is one writer, load acquires on
+* tbl_chng_cnt are not required.
+*/
+   __atomic_store_n(&h->tbl_chng_cnt,
+h->tbl_chng_cnt +

[dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys

2018-09-06 Thread Honnappa Nagarahalli
Reader-writer concurrency issue, caused by moving the keys
to their alternative locations during key insert, is solved
by introducing a global counter(tbl_chng_cnt) indicating a
change in table.

Signed-off-by: Honnappa Nagarahalli 
Reviewed-by: Gavin Hu 
Reviewed-by: Ola Liljedahl 
Reviewed-by: Steve Capper 
---
 lib/librte_hash/rte_cuckoo_hash.c | 307 +-
 lib/librte_hash/rte_cuckoo_hash.h |   2 +
 lib/librte_hash/rte_hash.h|   8 +-
 3 files changed, 206 insertions(+), 111 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index 2d89158..1e4a8d4 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -256,6 +256,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
 #endif
/* Setup hash context */
snprintf(h->name, sizeof(h->name), "%s", params->name);
+   h->tbl_chng_cnt = 0;
h->entries = params->entries;
h->key_len = params->key_len;
h->key_entry_size = key_entry_size;
@@ -588,7 +589,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
  * return 0 if succeeds.
  */
 static inline int
-rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
+rte_hash_cuckoo_move_insert_mw(struct rte_hash *h,
struct rte_hash_bucket *bkt,
struct rte_hash_bucket *alt_bkt,
const struct rte_hash_key *key, void *data,
@@ -639,11 +640,27 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
if (unlikely(&h->buckets[prev_alt_bkt_idx]
!= curr_bkt)) {
/* revert it to empty, otherwise duplicated keys */
-   curr_bkt->key_idx[curr_slot] = EMPTY_SLOT;
+   __atomic_store_n(&curr_bkt->key_idx[curr_slot],
+   EMPTY_SLOT,
+   __ATOMIC_RELEASE);
__hash_rw_writer_unlock(h);
return -1;
}
 
+   /* Inform the previous move. The current move need
+* not be informed now as the current bucket entry
+* is present in both primary and secondary.
+* Since there is one writer, load acquires on
+* tbl_chng_cnt are not required.
+*/
+   __atomic_store_n(&h->tbl_chng_cnt,
+h->tbl_chng_cnt + 1,
+__ATOMIC_RELEASE);
+   /* The stores to sig_alt and sig_current should not
+* move above the store to tbl_chng_cnt.
+*/
+   __atomic_thread_fence(__ATOMIC_RELEASE);
+
/* Need to swap current/alt sig to allow later
 * Cuckoo insert to move elements back to its
 * primary bucket if available
@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
curr_bkt = curr_node->bkt;
}
 
+   /* Inform the previous move. The current move need
+* not be informed now as the current bucket entry
+* is present in both primary and secondary.
+* Since there is one writer, load acquires on
+* tbl_chng_cnt are not required.
+*/
+   __atomic_store_n(&h->tbl_chng_cnt,
+h->tbl_chng_cnt + 1,
+__ATOMIC_RELEASE);
+   /* The stores to sig_alt and sig_current should not
+* move above the store to tbl_chng_cnt.
+*/
+   __atomic_thread_fence(__ATOMIC_RELEASE);
+
curr_bkt->sig_current[curr_slot] = sig;
curr_bkt->sig_alt[curr_slot] = alt_hash;
/* Release the new bucket entry */
@@ -680,7 +711,7 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
  * Cuckoo
  */
 static inline int
-rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
+rte_hash_cuckoo_make_space_mw(struct rte_hash *h,
struct rte_hash_bucket *bkt,
struct rte_hash_bucket *sec_bkt,
const struct rte_hash_key *key, void *data,
@@ -728,7 +759,7 @@ rte_hash_cuckoo_make_space_mw(const struct rte_hash *h,
 }
 
 static inline int32_t
-__rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
+__rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
hash_sig_t sig, void *data)
 {
hash_sig_t alt_hash;
@@ -844,7 +875,7 @@ __rte_hash_add_key_with_hash(const struct rte_hash *h, 
const void *key,
 }
 
 int32_t
-rte_hash_add_key_with_hash(const struct rte_hash *h,
+rte_hash_add_key_with_hash(struct rte_hash *h,
const void *key, hash_sig_t sig)
 {
RETURN_IF_TRUE(((h == NULL) || (key == NULL)), -EINVAL);
@@ -852,14 +883,14 @@ rte_hash_add_key_with_hash(const struct rte_hash *h,
 }
 
 int32_t
-rte_hash_add_key(const struct 

Re: [dpdk-dev] [PATCH v1 2/2] examples/vdpa: add a new sample for vdpa

2018-09-06 Thread Rami Rosen
Hi all,
First, thanks for the vdpa example patches.
Second, I am getting a compilation error under Ubuntu 18.04, with gcc
version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)
...
  CC main.o
/work/src/dpdk/examples/vdpa/main.c: In function ‘main’:
/work/src/dpdk/examples/vdpa/main.c:321:5: error: ignoring return
value of ‘scanf’, declared with attribute warn_unused_result
[-Werror=unused-result]
 scanf("%c", &ch);
 ^~~~
cc1: all warnings being treated as errors
/work/src/dpdk/mk/internal/rte.compile-pre.mk:114: recipe for target
'main.o' failed
make[1]: *** [main.o] Error 1

Also, it would be nice to have as part of this patch series adding the
relevant info in
MAINTAINERS, doc/guides/sample_app_ug/index.rst , examples/Makefile
and adding a doc/guides/sample_app_ug/vdpa.rst, like most  patches for
examples do.
See for example,

commit f5188211c721688bf8530d1648d623205246e1da
Author: Fan Zhang 
Date:   Thu Apr 5 17:01:36 2018 +0100
examples/vhost_crypto: add sample application

Regards,
Rami Rosen


Re: [dpdk-dev] IXGBE throughput loss with 4+ cores

2018-09-06 Thread Wiles, Keith


> On Sep 6, 2018, at 7:10 AM, Saber Rezvani  wrote:
> 
> 
> 
> On 08/29/2018 11:22 PM, Wiles, Keith wrote: 
> > 
> >> On Aug 29, 2018, at 12:19 PM, Saber Rezvani  wrote: 
> >> 
> >> 
> >> 
> >> On 08/29/2018 01:39 AM, Wiles, Keith wrote: 
>  On Aug 28, 2018, at 2:16 PM, Saber Rezvani  wrote: 
>  
>  
>  
>  On 08/28/2018 11:39 PM, Wiles, Keith wrote: 
> > Which version of Pktgen? I just pushed a patch in 3.5.3 to fix a 
> > performance problem. 
>  I use Pktgen verion 3.0.0, indeed it is O.k as far as I have one core. 
>  (10 Gb/s) but when I increase the number of core (one core per queue) 
>  then I loose some performance (roughly 8.5 Gb/s for 8-core). In my 
>  scenario Pktgen shows it is generating at line rate, but receiving 8.5 
>  Gb/s. 
>  Is it because of Pktgen??? 
> >>> Normally Pktgen can receive at line rate up to 10G 64 byte frames, which 
> >>> means Pktgen should not be the problem. You can verify that by looping 
> >>> the cable from one port to another on the pktgen machine to create a 
> >>> external loopback. Then send traffic what ever you can send from one port 
> >>> you should be able to receive those packets unless something is 
> >>> configured wrong. 
> >>> 
> >>> Please send me the command line for pktgen. 
> >>> 
> >>> 
> >>> In pktgen if you have this config -m “[1-4:5-8].0” then you have 4 cores 
> >>> sending traffic and 4 core receiving packets. 
> >>> 
> >>> In this case the TX cores will be sending the packets on all 4 lcores to 
> >>> the same port. On the rx side you have 4 cores polling 4 rx queues. The 
> >>> rx queues are controlled by RSS, which means the RX traffic 5 tuples hash 
> >>> must divide the inbound packets across all 4 queues to make sure each 
> >>> core is doing the same amount of work. If you are sending only a single 
> >>> packet on the Tx cores then only one rx queue be used. 
> >>> 
> >>> I hope that makes sense. 
> >> I think there is a misunderstanding of the problem. Indeed the problem is 
> >> not the Pktgen. 
> >> Here is my command --> ./app/app/x86_64-native-linuxapp-gcc/pktgen -c 
> >> ffc -n 4 -w 84:00.0 -w 84:00.1 --file-prefix pktgen_F2 --socket-mem 
> >> 1000,2000,1000,1000 -- -T -P -m "[18-19:20-21].0, [22:23].1" 
> >> 
> >> The problem is when I run the symmetric_mp example for 
> >> $numberOfProcesses=8 cores, then I have less throughput (roughly 8.4 
> >> Gb/s). but when I run it for $numberOfProcesses=3 cores throughput is 10G. 
> >> for i in `seq $numberOfProcesses`; 
> >> do 
> >>  some calculation goes here. 
> >> symmetric_mp -c $coremask -n 2 --proc-type=auto -w 0b:00.0 -w 0b:00.1 
> >> --file-prefix sm --socket-mem 4000,1000,1000,1000 -- -p 3 
> >> --num-procs=$numberOfProcesses --proc-id=$procid"; 
> >> . 
> >> done 
> > Most NICs have a limited amount of memory on the NIC and when you start to 
> > segment that memory because you are using more queues it can effect 
> > performance. 
> > 
> > In one of the NICs if you go over say 6 or 5 queues the memory per queue 
> > for Rx/Tx packets starts to become a bottle neck as you do not have enough 
> > memory in the Tx/Rx queues to hold enough packets. This can cause the NIC 
> > to drop Rx packets because the host can not pull the data from the NIC or 
> > Rx ring on the host fast enough. This seems to be the problem as the amount 
> > of time to process a packet on the host has not changed only the amount of 
> > buffer space in the NIC as you increase queues. 
> > 
> > I am not sure this is your issue, but I figured I would state this point. 
> What you said sounded logical, but is there away that I can be sure? I 
> mean are there some registers at NIC which show the number of packet 
> loss on NIC? or does DPDK have an API which shows the number of packet 
> loss at NIC level? 

Yes if you look in the Docs Readthedocs.org/projects/dpdk you can find the API 
something like rte_eth_stats_get()

> > 
> >> I am trying find out what makes this loss! 
> >> 
> >> 
> >> On Aug 28, 2018, at 12:05 PM, Saber Rezvani  wrote: 
> >> 
> >> 
> >> 
> >> On 08/28/2018 08:31 PM, Stephen Hemminger wrote: 
> >>> On Tue, 28 Aug 2018 17:34:27 +0430 
> >>> Saber Rezvani  wrote: 
> >>> 
>  Hi, 
>  
>  
>  I have run multi_process/symmetric_mp example in DPDK example 
>  directory. 
>  For a one process its throughput is line rate but as I increase the 
>  number of cores I see decrease in throughput. For example, If the 
>  number 
>  of queues set to 4 and each queue assigns to a single core, then the 
>  throughput will be something about 9.4. if 8 queues, then throughput 
>  will be 8.5. 
>  
>  I have read the following, but it was not convincing. 
>  
>  http://mails.dpdk.org/archives/dev/2015-October/024960.html 
>  
>  
>  I am eagerly looking forwa

[dpdk-dev] [PATCH v5 00/11] implement packed virtqueues

2018-09-06 Thread Jens Freimann
This is a basic implementation of packed virtqueues as specified in the
Virtio 1.1 draft. A compiled version of the current draft is available
at https://github.com/oasis-tcs/virtio-docs.git (or as .pdf at
https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd10.pdf

It does not implement yet indirect descriptors and checksum offloading.

A packed virtqueue is different from a split virtqueue in that it
consists of only a single descriptor ring that replaces available and
used ring, index and descriptor buffer.

Each descriptor is readable and writable and has a flags field. These flags
will mark if a descriptor is available or used.  To detect new available 
descriptors
even after the ring has wrapped, device and driver each have a
single-bit wrap counter that is flipped from 0 to 1 and vice versa every time
the last descriptor in the ring is used/made available.

The idea behind this is to 1. improve performance by avoiding cache misses
and 2. be easier for devices to implement.

Regarding performance: with these patches I get 21.13 Mpps on my system
as compared to 18.8 Mpps with the virtio 1.0 code. Packet size was 64
bytes, 0.05% acceptable loss.  Test setup is described as in
http://dpdk.org/doc/guides/howto/pvp_reference_benchmark.html

Packet generator:
MoonGen
Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
Intel X710 NIC
RHEL 7.4

Device under test:
Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
Intel X710 NIC
RHEL 7.4

VM on DuT: RHEL7.4

I plan to do more performance test with bigger frame sizes.


changes from v4->v5:
* fix VIRTQUEUE_DUMP macro
* fix wrap counter logic in transmit and receive functions  

changes from v3->v4:
* added helpers to increment index and set available/used flags
* driver keeps track of number of descriptors used
* change logic in set_rxtx_funcs()
* add patch for ctrl virtqueue with support for packed virtqueues
* rename virtio-1.1.h to virtio-packed.h
* fix wrong sizeof() in "vhost: vring address setup for packed queues"
* fix coding style of function definition in "net/virtio: add packed
  virtqueue helpers"
* fix padding in vring_size()
* move patches to enable packed virtqueues end of series
* v4 has two open problems: I'm sending it out anyway for feedback/help:
 * when VIRTIO_NET_F_MRG_RXBUF enabled only 128 packets are send in
   guest, i.e. when ring is full for the first time. I suspect a bug in
   setting the avail/used flags

changes from v2->v3:
* implement event suppression
* add code do dump packed virtqueues
* don't use assert in vhost code
* rename virtio-user parameter to packed-vq
* support rxvf flush

changes from v1->v2:
* don't use VIRTQ_DESC_F_NEXT in used descriptors (Jason)
* no rte_panice() in guest triggerable code (Maxime)
* use unlikely when checking for vq (Maxime)
* rename everything from _1_1 to _packed  (Yuanhan)
* add two more patches to implement mergeable receive buffers


Jens Freimann (10):
  net/virtio: vring init for packed queues
  net/virtio: add virtio 1.1 defines
  net/virtio: add packed virtqueue helpers
  net/virtio: flush packed receive virtqueues
  net/virtio: dump packed virtqueue data
  net/virtio: implement transmit path for packed queues
  net/virtio: implement receive path for packed queues
  net/virtio: disable ctrl virtqueue for packed rings
  net/virtio: add support for mergeable buffers with packed virtqueues
  net/virtio: add support for event suppression

Yuanhan Liu (1):
  net/virtio-user: add option to use packed queues

 drivers/net/virtio/virtio_ethdev.c|  50 ++-
 drivers/net/virtio/virtio_ethdev.h|   4 +
 drivers/net/virtio/virtio_pci.h   |   8 +
 drivers/net/virtio/virtio_ring.h  |  85 -
 drivers/net/virtio/virtio_rxtx.c  | 360 +-
 .../net/virtio/virtio_user/virtio_user_dev.c  |  10 +-
 .../net/virtio/virtio_user/virtio_user_dev.h  |   2 +-
 drivers/net/virtio/virtio_user_ethdev.c   |  14 +-
 drivers/net/virtio/virtqueue.c|  17 +
 drivers/net/virtio/virtqueue.h| 113 +-
 10 files changed, 630 insertions(+), 33 deletions(-)

-- 
2.17.1



[dpdk-dev] [PATCH v5 02/11] net/virtio: add virtio 1.1 defines

2018-09-06 Thread Jens Freimann
Signed-off-by: Jens Freimann 
---
 drivers/net/virtio/virtio_ring.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/virtio/virtio_ring.h b/drivers/net/virtio/virtio_ring.h
index cea4d441e..e2c597434 100644
--- a/drivers/net/virtio/virtio_ring.h
+++ b/drivers/net/virtio/virtio_ring.h
@@ -15,7 +15,11 @@
 #define VRING_DESC_F_WRITE  2
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT   4
+/* This flag means the descriptor was made available by the driver */
 
+#define VRING_DESC_F_AVAIL(b)   ((uint16_t)(b) << 7)
+/* This flag means the descriptor was used by the device */
+#define VRING_DESC_F_USED(b)((uint16_t)(b) << 15)
 /* The Host uses this in used->flags to advise the Guest: don't kick me
  * when you add a buffer.  It's unreliable, so it's simply an
  * optimization.  Guest will still kick if it's out of buffers. */
-- 
2.17.1



[dpdk-dev] [PATCH v5 01/11] net/virtio: vring init for packed queues

2018-09-06 Thread Jens Freimann
Add and initialize descriptor data structures.

Signed-off-by: Jens Freimann 
---
 drivers/net/virtio/virtio_ethdev.c | 22 ++--
 drivers/net/virtio/virtio_pci.h|  8 +
 drivers/net/virtio/virtio_ring.h   | 55 +++---
 drivers/net/virtio/virtqueue.h | 10 ++
 4 files changed, 80 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 614357da7..ad91f7f82 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -299,19 +299,21 @@ virtio_init_vring(struct virtqueue *vq)
 
PMD_INIT_FUNC_TRACE();
 
-   /*
-* Reinitialise since virtio port might have been stopped and restarted
-*/
memset(ring_mem, 0, vq->vq_ring_size);
-   vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
-   vq->vq_used_cons_idx = 0;
-   vq->vq_desc_head_idx = 0;
-   vq->vq_avail_idx = 0;
-   vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
+   vring_init(vq->hw, vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
+
vq->vq_free_cnt = vq->vq_nentries;
memset(vq->vq_descx, 0, sizeof(struct vq_desc_extra) * vq->vq_nentries);
+   vq->vq_used_cons_idx = 0;
+   vq->vq_avail_idx = 0;
+   if (vtpci_packed_queue(vq->hw)) {
+   vring_desc_init_packed(vr, size);
+   } else {
+   vq->vq_desc_head_idx = 0;
+   vq->vq_desc_tail_idx = (uint16_t)(vq->vq_nentries - 1);
 
-   vring_desc_init(vr->desc, size);
+   vring_desc_init(vr->desc, size);
+   }
 
/*
 * Disable device(host) interrupting guest
@@ -386,7 +388,7 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
vtpci_queue_idx)
/*
 * Reserve a memzone for vring elements
 */
-   size = vring_size(vq_size, VIRTIO_PCI_VRING_ALIGN);
+   size = vring_size(hw, vq_size, VIRTIO_PCI_VRING_ALIGN);
vq->vq_ring_size = RTE_ALIGN_CEIL(size, VIRTIO_PCI_VRING_ALIGN);
PMD_INIT_LOG(DEBUG, "vring_size: %d, rounded_vring_size: %d",
 size, vq->vq_ring_size);
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 58fdd3d45..90204d281 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -113,6 +113,8 @@ struct virtnet_ctl;
 
 #define VIRTIO_F_VERSION_1 32
 #define VIRTIO_F_IOMMU_PLATFORM33
+#define VIRTIO_F_RING_PACKED   34
+#define VIRTIO_F_IN_ORDER  35
 
 /*
  * Some VirtIO feature bits (currently bits 28 through 31) are
@@ -314,6 +316,12 @@ vtpci_with_feature(struct virtio_hw *hw, uint64_t bit)
return (hw->guest_features & (1ULL << bit)) != 0;
 }
 
+static inline int
+vtpci_packed_queue(struct virtio_hw *hw)
+{
+   return vtpci_with_feature(hw, VIRTIO_F_RING_PACKED);
+}
+
 /*
  * Function declaration from virtio_pci.c
  */
diff --git a/drivers/net/virtio/virtio_ring.h b/drivers/net/virtio/virtio_ring.h
index 9e3c2a015..cea4d441e 100644
--- a/drivers/net/virtio/virtio_ring.h
+++ b/drivers/net/virtio/virtio_ring.h
@@ -54,11 +54,38 @@ struct vring_used {
struct vring_used_elem ring[0];
 };
 
+/* For support of packed virtqueues in Virtio 1.1 the format of descriptors
+ * looks like this.
+ */
+struct vring_desc_packed {
+   uint64_t addr;
+   uint32_t len;
+   uint16_t index;
+   uint16_t flags;
+};
+
+#define RING_EVENT_FLAGS_ENABLE 0x0
+#define RING_EVENT_FLAGS_DISABLE 0x1
+#define RING_EVENT_FLAGS_DESC 0x2
+struct vring_packed_desc_event {
+   uint16_t desc_event_off_wrap;
+   uint16_t desc_event_flags;
+};
+
 struct vring {
unsigned int num;
-   struct vring_desc  *desc;
-   struct vring_avail *avail;
-   struct vring_used  *used;
+   union {
+   struct vring_desc_packed *desc_packed;
+   struct vring_desc *desc;
+   };
+   union {
+   struct vring_avail *avail;
+   struct vring_packed_desc_event *driver_event;
+   };
+   union {
+   struct vring_used  *used;
+   struct vring_packed_desc_event *device_event;
+   };
 };
 
 /* The standard layout for the ring is a continuous chunk of memory which
@@ -95,10 +122,18 @@ struct vring {
 #define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
 
 static inline size_t
-vring_size(unsigned int num, unsigned long align)
+vring_size(struct virtio_hw *hw, unsigned int num, unsigned long align)
 {
size_t size;
 
+   if (vtpci_packed_queue(hw)) {
+   size = num * sizeof(struct vring_desc_packed);
+   size += sizeof(struct vring_packed_desc_event);
+   size = RTE_ALIGN_CEIL(size, align);
+   size += sizeof(struct vring_packed_desc_event);
+   return size;
+   }
+
size = num * sizeof(struct vring_desc);
size += sizeof(struct vring_avail) + (

[dpdk-dev] [PATCH v5 04/11] net/virtio: flush packed receive virtqueues

2018-09-06 Thread Jens Freimann
Flush used descriptors in packed receive virtqueue. As descriptors
can be chained we need to look at the stored number of used descriptors
to find out the length of the chain.

Signed-off-by: Jens Freimann 
---
 drivers/net/virtio/virtqueue.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/virtio/virtqueue.c b/drivers/net/virtio/virtqueue.c
index 56a77cc71..d0520dad1 100644
--- a/drivers/net/virtio/virtqueue.c
+++ b/drivers/net/virtio/virtqueue.c
@@ -58,12 +58,29 @@ virtqueue_detach_unused(struct virtqueue *vq)
 void
 virtqueue_rxvq_flush(struct virtqueue *vq)
 {
+   struct vring_desc_packed *descs = vq->vq_ring.desc_packed;
struct virtnet_rx *rxq = &vq->rxq;
struct virtio_hw *hw = vq->hw;
struct vring_used_elem *uep;
struct vq_desc_extra *dxp;
uint16_t used_idx, desc_idx;
uint16_t nb_used, i;
+   uint16_t size = vq->vq_nentries;
+
+   if (vtpci_packed_queue(vq->hw)) {
+   i = vq->vq_used_cons_idx;
+   while (desc_is_used(&descs[i], &vq->vq_ring) &&
+   i < vq->vq_nentries) {
+   dxp = &vq->vq_descx[i];
+   if (dxp->cookie != NULL)
+   rte_pktmbuf_free(dxp->cookie);
+   vq->vq_free_cnt += dxp->ndescs;
+   i = i + dxp->ndescs;
+   i = i >= size ? i - size : i;
+   dxp->ndescs = 0;
+   }
+   return;
+   }
 
nb_used = VIRTQUEUE_NUSED(vq);
 
-- 
2.17.1



[dpdk-dev] [PATCH v5 06/11] net/virtio-user: add option to use packed queues

2018-09-06 Thread Jens Freimann
From: Yuanhan Liu 

Add option to enable packed queue support for virtio-user
devices.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 10 --
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  2 +-
 drivers/net/virtio/virtio_user_ethdev.c  | 14 +-
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c 
b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 7df600b02..9979bea0d 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -372,12 +372,13 @@ virtio_user_dev_setup(struct virtio_user_dev *dev)
 1ULL << VIRTIO_NET_F_GUEST_TSO4|   \
 1ULL << VIRTIO_NET_F_GUEST_TSO6|   \
 1ULL << VIRTIO_F_IN_ORDER  |   \
-1ULL << VIRTIO_F_VERSION_1)
+1ULL << VIRTIO_F_VERSION_1 |   \
+1ULL << VIRTIO_F_RING_PACKED)
 
 int
 virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 int cq, int queue_size, const char *mac, char **ifname,
-int mrg_rxbuf, int in_order)
+int mrg_rxbuf, int in_order, int packed_vq)
 {
pthread_mutex_init(&dev->mutex, NULL);
snprintf(dev->path, PATH_MAX, "%s", path);
@@ -432,6 +433,11 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char 
*path, int queues,
dev->unsupported_features |= (1ull << VIRTIO_F_IN_ORDER);
}
 
+   if (packed_vq)
+   dev->device_features |= (1ull << VIRTIO_F_RING_PACKED);
+   else
+   dev->device_features &= ~(1ull << VIRTIO_F_RING_PACKED);
+
if (dev->mac_specified) {
dev->device_features |= (1ull << VIRTIO_NET_F_MAC);
} else {
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h 
b/drivers/net/virtio/virtio_user/virtio_user_dev.h
index d6e0e137b..7f46ba1d9 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
@@ -49,7 +49,7 @@ int virtio_user_start_device(struct virtio_user_dev *dev);
 int virtio_user_stop_device(struct virtio_user_dev *dev);
 int virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 int cq, int queue_size, const char *mac, char **ifname,
-int mrg_rxbuf, int in_order);
+int mrg_rxbuf, int in_order, int packed_vq);
 void virtio_user_dev_uninit(struct virtio_user_dev *dev);
 void virtio_user_handle_cq(struct virtio_user_dev *dev, uint16_t queue_idx);
 uint8_t virtio_user_handle_mq(struct virtio_user_dev *dev, uint16_t q_pairs);
diff --git a/drivers/net/virtio/virtio_user_ethdev.c 
b/drivers/net/virtio/virtio_user_ethdev.c
index 525d16cab..72ac86186 100644
--- a/drivers/net/virtio/virtio_user_ethdev.c
+++ b/drivers/net/virtio/virtio_user_ethdev.c
@@ -364,6 +364,8 @@ static const char *valid_args[] = {
VIRTIO_USER_ARG_MRG_RXBUF,
 #define VIRTIO_USER_ARG_IN_ORDER   "in_order"
VIRTIO_USER_ARG_IN_ORDER,
+#define VIRTIO_USER_ARG_PACKED_VQ "packed_vq"
+   VIRTIO_USER_ARG_PACKED_VQ,
NULL
 };
 
@@ -473,6 +475,7 @@ virtio_user_pmd_probe(struct rte_vdev_device *dev)
char *ifname = NULL;
char *mac_addr = NULL;
int ret = -1;
+   uint64_t packed_vq = 0;
 
kvlist = rte_kvargs_parse(rte_vdev_device_args(dev), valid_args);
if (!kvlist) {
@@ -556,6 +559,15 @@ virtio_user_pmd_probe(struct rte_vdev_device *dev)
cq = 1;
}
 
+   if (rte_kvargs_count(kvlist, VIRTIO_USER_ARG_PACKED_VQ) == 1) {
+   if (rte_kvargs_process(kvlist, VIRTIO_USER_ARG_PACKED_VQ,
+  &get_integer_arg, &packed_vq) < 0) {
+   PMD_INIT_LOG(ERR, "error to parse %s",
+VIRTIO_USER_ARG_PACKED_VQ);
+   goto end;
+   }
+   }
+
if (queues > 1 && cq == 0) {
PMD_INIT_LOG(ERR, "multi-q requires ctrl-q");
goto end;
@@ -603,7 +615,7 @@ virtio_user_pmd_probe(struct rte_vdev_device *dev)
vu_dev->is_server = false;
if (virtio_user_dev_init(hw->virtio_user_dev, path, queues, cq,
 queue_size, mac_addr, &ifname, mrg_rxbuf,
-in_order) < 0) {
+in_order, packed_vq) < 0) {
PMD_INIT_LOG(ERR, "virtio_user_dev_init fails");
virtio_user_eth_dev_free(eth_dev);
goto end;
-- 
2.17.1



[dpdk-dev] [PATCH v5 03/11] net/virtio: add packed virtqueue helpers

2018-09-06 Thread Jens Freimann
Add helper functions to set/clear and check descriptor flags.

Signed-off-by: Jens Freimann 
---
 drivers/net/virtio/virtio_ring.h | 26 ++
 drivers/net/virtio/virtqueue.h   | 19 +++
 2 files changed, 45 insertions(+)

diff --git a/drivers/net/virtio/virtio_ring.h b/drivers/net/virtio/virtio_ring.h
index e2c597434..f3b23f419 100644
--- a/drivers/net/virtio/virtio_ring.h
+++ b/drivers/net/virtio/virtio_ring.h
@@ -78,6 +78,8 @@ struct vring_packed_desc_event {
 
 struct vring {
unsigned int num;
+   unsigned int avail_wrap_counter;
+   unsigned int used_wrap_counter;
union {
struct vring_desc_packed *desc_packed;
struct vring_desc *desc;
@@ -92,6 +94,30 @@ struct vring {
};
 };
 
+static inline void
+_set_desc_avail(struct vring_desc_packed *desc, int wrap_counter)
+{
+   desc->flags |= VRING_DESC_F_AVAIL(wrap_counter) |
+  VRING_DESC_F_USED(!wrap_counter);
+}
+
+static inline void
+set_desc_avail(struct vring *vr, struct vring_desc_packed *desc)
+{
+   _set_desc_avail(desc, vr->avail_wrap_counter);
+}
+
+static inline int
+desc_is_used(struct vring_desc_packed *desc, struct vring *vr)
+{
+   uint16_t used, avail;
+
+   used = !!(desc->flags & VRING_DESC_F_USED(1));
+   avail = !!(desc->flags & VRING_DESC_F_AVAIL(1));
+
+   return used == avail && used == vr->used_wrap_counter;
+}
+
 /* The standard layout for the ring is a continuous chunk of memory which
  * looks like this.  We assume num is a power of 2.
  *
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index d2a0b651a..53fce61b4 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -245,6 +245,25 @@ struct virtio_tx_region {
   __attribute__((__aligned__(16)));
 };
 
+static inline uint16_t
+increment_pq_index(uint16_t idx, size_t ring_size)
+{
+   return ++idx >= ring_size ? 0 : idx;
+}
+
+static inline uint16_t
+update_pq_avail_index(struct virtqueue *vq)
+{
+   uint16_t idx;
+
+   idx = increment_pq_index(vq->vq_avail_idx, vq->vq_nentries);
+   if (idx == 0)
+   vq->vq_ring.avail_wrap_counter ^= 1;
+   vq->vq_avail_idx = idx;
+
+   return vq->vq_avail_idx;
+}
+
 static inline void
 vring_desc_init_packed(struct vring *vr, int n)
 {
-- 
2.17.1



[dpdk-dev] [PATCH v5 05/11] net/virtio: dump packed virtqueue data

2018-09-06 Thread Jens Freimann
Add support to dump packed virtqueue data to the
VIRTQUEUE_DUMP() macro.

Signed-off-by: Jens Freimann 
---
 drivers/net/virtio/virtqueue.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 53fce61b4..531ba8c65 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -384,6 +384,12 @@ virtqueue_notify(struct virtqueue *vq)
uint16_t used_idx, nused; \
used_idx = (vq)->vq_ring.used->idx; \
nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+   if (vtpci_packed_queue((vq)->hw)) { \
+ PMD_INIT_LOG(DEBUG, \
+ "VQ: - size=%d; free=%d; last_used_idx=%d;", \
+ (vq)->vq_nentries, (vq)->vq_free_cnt, nused); \
+ break; \
+   } \
PMD_INIT_LOG(DEBUG, \
  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
-- 
2.17.1



[dpdk-dev] [PATCH v5 07/11] net/virtio: implement transmit path for packed queues

2018-09-06 Thread Jens Freimann
This implements the transmit path for devices with
support for packed virtqueues.

Add the feature bit and enable code to
add buffers to vring and mark descriptors as available.

Signed-off-by: Jens Freiman 
---
 drivers/net/virtio/virtio_ethdev.c |   8 +-
 drivers/net/virtio/virtio_ethdev.h |   2 +
 drivers/net/virtio/virtio_rxtx.c   | 113 -
 3 files changed, 121 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index ad91f7f82..d2c5755bb 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -384,6 +384,8 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
vtpci_queue_idx)
vq->hw = hw;
vq->vq_queue_index = vtpci_queue_idx;
vq->vq_nentries = vq_size;
+   if (vtpci_packed_queue(hw))
+   vq->vq_ring.avail_wrap_counter = 1;
 
/*
 * Reserve a memzone for vring elements
@@ -1338,7 +1340,11 @@ set_rxtx_funcs(struct rte_eth_dev *eth_dev)
eth_dev->rx_pkt_burst = &virtio_recv_pkts;
}
 
-   if (hw->use_inorder_tx) {
+   if (vtpci_packed_queue(hw)) {
+   PMD_INIT_LOG(INFO, "virtio: using virtio 1.1 Tx path on port 
%u",
+   eth_dev->data->port_id);
+   eth_dev->tx_pkt_burst = virtio_xmit_pkts_packed;
+   } else if (hw->use_inorder_tx) {
PMD_INIT_LOG(INFO, "virtio: using inorder Tx path on port %u",
eth_dev->data->port_id);
eth_dev->tx_pkt_burst = virtio_xmit_pkts_inorder;
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index b726ad108..04161b461 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -79,6 +79,8 @@ uint16_t virtio_recv_mergeable_pkts_inorder(void *rx_queue,
 
 uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
+uint16_t virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf **tx_pkts,
+   uint16_t nb_pkts);
 
 uint16_t virtio_xmit_pkts_inorder(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index eb891433e..12787070e 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -38,6 +38,112 @@
 #define  VIRTIO_DUMP_PACKET(m, len) do { } while (0)
 #endif
 
+
+/* Cleanup from completed transmits. */
+static void
+virtio_xmit_cleanup_packed(struct virtqueue *vq)
+{
+   uint16_t idx;
+   uint16_t size = vq->vq_nentries;
+   struct vring_desc_packed *desc = vq->vq_ring.desc_packed;
+   struct vq_desc_extra *dxp;
+
+   idx = vq->vq_used_cons_idx;
+   while (desc_is_used(&desc[idx], &vq->vq_ring) &&
+  vq->vq_free_cnt < size) {
+   dxp = &vq->vq_descx[idx];
+   vq->vq_free_cnt += dxp->ndescs;
+   idx = dxp->ndescs;
+   idx = idx >= size ? idx - size : idx;
+   }
+}
+
+uint16_t
+virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf **tx_pkts,
+uint16_t nb_pkts)
+{
+   struct virtnet_tx *txvq = tx_queue;
+   struct virtqueue *vq = txvq->vq;
+   uint16_t i;
+   struct vring_desc_packed *desc = vq->vq_ring.desc_packed;
+   uint16_t idx, prev;
+   struct vq_desc_extra *dxp;
+
+   if (unlikely(nb_pkts < 1))
+   return nb_pkts;
+
+   PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts);
+
+   if (likely(vq->vq_free_cnt < vq->vq_free_thresh))
+   virtio_xmit_cleanup_packed(vq);
+
+   for (i = 0; i < nb_pkts; i++) {
+   struct rte_mbuf *txm = tx_pkts[i];
+   struct virtio_tx_region *txr = txvq->virtio_net_hdr_mz->addr;
+   uint16_t head_idx;
+   int wrap_counter;
+   int descs_used;
+
+   if (unlikely(txm->nb_segs + 1 > vq->vq_free_cnt)) {
+   virtio_xmit_cleanup_packed(vq);
+
+   if (unlikely(txm->nb_segs + 1 > vq->vq_free_cnt)) {
+   PMD_TX_LOG(ERR,
+  "No free tx descriptors to 
transmit");
+   break;
+   }
+   }
+
+   txvq->stats.bytes += txm->pkt_len;
+
+   vq->vq_free_cnt -= txm->nb_segs + 1;
+
+   wrap_counter = vq->vq_ring.avail_wrap_counter;
+   idx = vq->vq_avail_idx; 
+   head_idx = idx;
+
+   dxp = &vq->vq_descx[idx];
+   if (dxp->cookie != NULL)
+   rte_pktmbuf_free(dxp->cookie);
+   dxp->cookie = txm;
+
+   desc[idx].addr  = txvq->virtio_net_hdr_mem +
+ RTE_PTR_DIFF(&txr[idx].tx_hdr, txr);
+   desc[idx].len   = vq->hw->vtnet_hdr_size;

[dpdk-dev] [PATCH v5 09/11] net/virtio: disable ctrl virtqueue for packed rings

2018-09-06 Thread Jens Freimann
Signed-off-by: Jens Freiman 
---
 drivers/net/virtio/virtio_ethdev.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index a2bb726ba..b02c65598 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1161,6 +1161,15 @@ virtio_negotiate_features(struct virtio_hw *hw, uint64_t 
req_features)
req_features &= ~(1ULL << VIRTIO_NET_F_MTU);
}
 
+#ifdef RTE_LIBRTE_VIRTIO_PQ
+   if (req_features & (1ULL << VIRTIO_F_RING_PACKED)) {
+   req_features &= ~(1ull << VIRTIO_NET_F_CTRL_MAC_ADDR);
+   req_features &= ~(1ull << VIRTIO_NET_F_CTRL_VQ);
+   req_features &= ~(1ull << VIRTIO_NET_F_CTRL_RX);
+   req_features &= ~(1ull << VIRTIO_NET_F_CTRL_VLAN);
+   }
+#endif
+
/*
 * Negotiate features: Subset of device feature bits are written back
 * guest feature bits.
-- 
2.17.1



[dpdk-dev] [PATCH v5 11/11] net/virtio: add support for event suppression

2018-09-06 Thread Jens Freimann
Signed-off-by: Jens Freimann 
---
 drivers/net/virtio/virtio_ethdev.c |  2 +-
 drivers/net/virtio/virtio_rxtx.c   | 15 +-
 drivers/net/virtio/virtqueue.h | 77 --
 3 files changed, 89 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index f2e515838..4249e52c7 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -730,7 +730,7 @@ virtio_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, 
uint16_t queue_id)
struct virtnet_rx *rxvq = dev->data->rx_queues[queue_id];
struct virtqueue *vq = rxvq->vq;
 
-   virtqueue_enable_intr(vq);
+   virtqueue_enable_intr(vq, 0, 0);
return 0;
 }
 
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 577786b7e..5dee3f12b 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -137,6 +137,10 @@ virtio_xmit_pkts_packed(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 
vq->vq_descx[head_idx].ndescs = descs_used;
idx = update_pq_avail_index(vq);
+   if (unlikely(virtqueue_kick_prepare_packed(vq))) {
+   virtqueue_notify(vq);
+   PMD_RX_LOG(DEBUG, "Notified");
+   }
}
 
txvq->stats.packets += i;
@@ -1193,6 +1197,10 @@ virtio_recv_pkts_packed(void *rx_queue, struct rte_mbuf 
**rx_pkts,
}
 
rxvq->stats.packets += nb_rx;
+   if (nb_rx > 0 && unlikely(virtqueue_kick_prepare_packed(vq))) {
+   virtqueue_notify(vq);
+   PMD_RX_LOG(DEBUG, "Notified");
+   }
 
return nb_rx;
 }
@@ -1648,8 +1656,13 @@ virtio_recv_mergeable_pkts(void *rx_queue,
 
rxvq->stats.packets += nb_rx;
 
-   if (vtpci_packed_queue(vq->hw))
+   if (vtpci_packed_queue(vq->hw)) {
+   if (unlikely(virtqueue_kick_prepare(vq))) {
+   virtqueue_notify(vq);
+   PMD_RX_LOG(DEBUG, "Notified");
+   }
return nb_rx;
+   }
 
/* Allocate new mbuf for the used descriptor */
while (likely(!virtqueue_full(vq))) {
diff --git a/drivers/net/virtio/virtqueue.h b/drivers/net/virtio/virtqueue.h
index 735066486..9d3e322a2 100644
--- a/drivers/net/virtio/virtqueue.h
+++ b/drivers/net/virtio/virtqueue.h
@@ -176,6 +176,8 @@ struct virtqueue {
uint16_t vq_free_cnt;  /**< num of desc available */
uint16_t vq_avail_idx; /**< sync until needed */
uint16_t vq_free_thresh; /**< free threshold */
+   uint16_t vq_signalled_avail;
+   int vq_signalled_avail_valid;
 
void *vq_ring_virt_mem;  /**< linear address of vring*/
unsigned int vq_ring_size;
@@ -292,16 +294,37 @@ vring_desc_init(struct vring_desc *dp, uint16_t n)
 static inline void
 virtqueue_disable_intr(struct virtqueue *vq)
 {
-   vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+   if (vtpci_packed_queue(vq->hw) && vtpci_with_feature(vq->hw,
+   VIRTIO_RING_F_EVENT_IDX))
+   vq->vq_ring.device_event->desc_event_flags =
+   RING_EVENT_FLAGS_DISABLE;
+   else
+   vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
 }
 
 /**
  * Tell the backend to interrupt us.
  */
 static inline void
-virtqueue_enable_intr(struct virtqueue *vq)
+virtqueue_enable_intr(struct virtqueue *vq, uint16_t off, uint16_t 
wrap_counter)
 {
-   vq->vq_ring.avail->flags &= (~VRING_AVAIL_F_NO_INTERRUPT);
+   uint16_t *flags = &vq->vq_ring.device_event->desc_event_flags;
+   uint16_t *event_off_wrap =
+   &vq->vq_ring.device_event->desc_event_off_wrap;
+   if (vtpci_packed_queue(vq->hw)) {
+   *flags = 0;
+   *event_off_wrap = 0;
+   if (*event_off_wrap & RING_EVENT_FLAGS_DESC) {
+   *event_off_wrap = off | 0x7FFF;
+   *event_off_wrap |= wrap_counter << 15;
+   *flags |= RING_EVENT_FLAGS_DESC;
+   } else {
+   *event_off_wrap = 0;
+   }
+   *flags |= RING_EVENT_FLAGS_ENABLE;
+   } else {
+   vq->vq_ring.avail->flags &= (~VRING_AVAIL_F_NO_INTERRUPT);
+   }
 }
 
 /**
@@ -363,12 +386,60 @@ vq_update_avail_ring(struct virtqueue *vq, uint16_t 
desc_idx)
vq->vq_avail_idx++;
 }
 
+static int vhost_idx_diff(struct virtqueue *vq, uint16_t old, uint16_t new)
+{
+   if (new > old)
+   return new - old;
+   return  (new + vq->vq_nentries - old);
+}
+
+static int vring_packed_need_event(struct virtqueue *vq,
+   uint16_t event_off, uint16_t new,
+   uint16_t old)
+{
+   return (uint16_t)(vhost_idx_diff(vq, new, event_off) - 1) <
+   (uint16_t)vhost_idx_diff(vq, new, old);
+}
+
+
 static inline int
 virtqueue_kick_prepare(struct

[dpdk-dev] [PATCH v5 10/11] net/virtio: add support for mergeable buffers with packed virtqueues

2018-09-06 Thread Jens Freimann
Implement support for receiving merged buffers in virtio when packed
virtqueues are enabled.

Signed-off-by: Jens Freimann 
---
 drivers/net/virtio/virtio_ethdev.c |   4 --
 drivers/net/virtio/virtio_rxtx.c   | 103 +++--
 drivers/net/virtio/virtqueue.h |   1 +
 3 files changed, 98 insertions(+), 10 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index b02c65598..f2e515838 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1331,10 +1331,6 @@ set_rxtx_funcs(struct rte_eth_dev *eth_dev)
 {
struct virtio_hw *hw = eth_dev->data->dev_private;
 
-   /*
-* workarount for packed vqs which don't support
-* mrg_rxbuf at this point
-*/
if (vtpci_packed_queue(hw)) {
eth_dev->rx_pkt_burst = &virtio_recv_pkts_packed;
} else if (hw->use_simple_rx) {
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 3f5fa7366..577786b7e 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -195,6 +195,79 @@ vq_ring_free_chain(struct virtqueue *vq, uint16_t desc_idx)
dp->next = VQ_RING_DESC_CHAIN_END;
 }
 
+static void
+virtio_refill_packed(struct virtqueue *vq, uint16_t used_idx,
+struct virtnet_rx *rxvq)
+{
+   struct vq_desc_extra *dxp;
+   struct vring_desc_packed *descs = vq->vq_ring.desc_packed;
+   struct vring_desc_packed *desc;
+   struct rte_mbuf *nmb;
+
+   nmb = rte_mbuf_raw_alloc(rxvq->mpool);
+   if (unlikely(nmb == NULL)) {
+   struct rte_eth_dev *dev
+   = &rte_eth_devices[rxvq->port_id];
+   dev->data->rx_mbuf_alloc_failed++;
+   return;
+   }
+
+   desc = &descs[used_idx];
+
+   dxp = &vq->vq_descx[used_idx];
+
+   dxp->cookie = nmb;
+   dxp->ndescs = 1;
+
+   desc->addr = VIRTIO_MBUF_ADDR(nmb, vq) +
+   RTE_PKTMBUF_HEADROOM - vq->hw->vtnet_hdr_size;
+   desc->len = nmb->buf_len - RTE_PKTMBUF_HEADROOM +
+   vq->hw->vtnet_hdr_size;
+   desc->flags |= VRING_DESC_F_WRITE;
+}
+
+static uint16_t
+virtqueue_dequeue_burst_rx_packed(struct virtqueue *vq,
+ struct rte_mbuf **rx_pkts,
+ uint32_t *len,
+ uint16_t num,
+ struct virtnet_rx *rx_queue)
+{
+   struct rte_mbuf *cookie;
+   uint16_t used_idx;
+   struct vring_desc_packed *desc;
+   uint16_t i;
+
+   for (i = 0; i < num; i++) {
+   used_idx = vq->vq_used_cons_idx;
+   desc = &vq->vq_ring.desc_packed[used_idx];
+   if (!desc_is_used(desc, &vq->vq_ring))
+   return i;
+   len[i] = desc->len;
+   cookie = (struct rte_mbuf *)vq->vq_descx[used_idx].cookie;
+
+   if (unlikely(cookie == NULL)) {
+   PMD_DRV_LOG(ERR, "vring descriptor with no mbuf cookie 
at %u",
+   vq->vq_used_cons_idx);
+   break;
+   }
+   rte_prefetch0(cookie);
+   rte_packet_prefetch(rte_pktmbuf_mtod(cookie, void *));
+   rx_pkts[i] = cookie;
+
+   virtio_refill_packed(vq, used_idx, rx_queue);
+
+   rte_smp_wmb();
+   if (vq->vq_used_cons_idx == 0)
+   vq->vq_ring.used_wrap_counter ^= 1;
+   set_desc_avail(&vq->vq_ring, desc);
+   vq->vq_used_cons_idx = increment_pq_index(vq->vq_used_cons_idx,
+ vq->vq_nentries);
+   }
+
+   return i;
+}
+
 static uint16_t
 virtqueue_dequeue_burst_rx(struct virtqueue *vq, struct rte_mbuf **rx_pkts,
   uint32_t *len, uint16_t num)
@@ -1436,12 +1509,16 @@ virtio_recv_mergeable_pkts(void *rx_queue,
uint16_t extra_idx;
uint32_t seg_res;
uint32_t hdr_size;
+   uint32_t rx_num = 0;
 
nb_rx = 0;
if (unlikely(hw->started == 0))
return nb_rx;
 
-   nb_used = VIRTQUEUE_NUSED(vq);
+   if (vtpci_packed_queue(vq->hw))
+   nb_used = VIRTIO_MBUF_BURST_SZ;
+   else
+   nb_used = VIRTQUEUE_NUSED(vq);
 
virtio_rmb();
 
@@ -1454,13 +1531,21 @@ virtio_recv_mergeable_pkts(void *rx_queue,
seg_res = 0;
hdr_size = hw->vtnet_hdr_size;
 
+   vq->vq_used_idx = vq->vq_used_cons_idx;
+
while (i < nb_used) {
struct virtio_net_hdr_mrg_rxbuf *header;
 
if (nb_rx == nb_pkts)
break;
 
-   num = virtqueue_dequeue_burst_rx(vq, rcv_pkts, len, 1);
+   if (vtpci_packed_queue(vq->hw))
+   num = virtqueue_dequeue_burst_rx_packed(vq, rcv_pkts,
+   

[dpdk-dev] [PATCH v5 08/11] net/virtio: implement receive path for packed queues

2018-09-06 Thread Jens Freimann
Implement the receive part.

Signed-off-by: Jens Freimann 
---
 drivers/net/virtio/virtio_ethdev.c |  15 +++-
 drivers/net/virtio/virtio_ethdev.h |   2 +
 drivers/net/virtio/virtio_rxtx.c   | 131 +
 3 files changed, 145 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index d2c5755bb..a2bb726ba 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -384,8 +384,10 @@ virtio_init_queue(struct rte_eth_dev *dev, uint16_t 
vtpci_queue_idx)
vq->hw = hw;
vq->vq_queue_index = vtpci_queue_idx;
vq->vq_nentries = vq_size;
-   if (vtpci_packed_queue(hw))
+   if (vtpci_packed_queue(hw)) {
vq->vq_ring.avail_wrap_counter = 1;
+   vq->vq_ring.used_wrap_counter = 1;
+   }
 
/*
 * Reserve a memzone for vring elements
@@ -1320,7 +1322,13 @@ set_rxtx_funcs(struct rte_eth_dev *eth_dev)
 {
struct virtio_hw *hw = eth_dev->data->dev_private;
 
-   if (hw->use_simple_rx) {
+   /*
+* workarount for packed vqs which don't support
+* mrg_rxbuf at this point
+*/
+   if (vtpci_packed_queue(hw)) {
+   eth_dev->rx_pkt_burst = &virtio_recv_pkts_packed;
+   } else if (hw->use_simple_rx) {
PMD_INIT_LOG(INFO, "virtio: using simple Rx path on port %u",
eth_dev->data->port_id);
eth_dev->rx_pkt_burst = virtio_recv_pkts_vec;
@@ -1484,7 +1492,8 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t 
req_features)
 
/* Setting up rx_header size for the device */
if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) ||
-   vtpci_with_feature(hw, VIRTIO_F_VERSION_1))
+   vtpci_with_feature(hw, VIRTIO_F_VERSION_1) ||
+   vtpci_with_feature(hw, VIRTIO_F_RING_PACKED))
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
else
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
diff --git a/drivers/net/virtio/virtio_ethdev.h 
b/drivers/net/virtio/virtio_ethdev.h
index 04161b461..25eaff224 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -70,6 +70,8 @@ int virtio_dev_tx_queue_setup_finish(struct rte_eth_dev *dev,
 
 uint16_t virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);
+uint16_t virtio_recv_pkts_packed(void *rx_queue, struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts);
 
 uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 12787070e..3f5fa7366 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -31,6 +31,7 @@
 #include "virtqueue.h"
 #include "virtio_rxtx.h"
 #include "virtio_rxtx_simple.h"
+#include "virtio_ring.h"
 
 #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP
 #define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len)
@@ -710,6 +711,34 @@ virtio_dev_rx_queue_setup_finish(struct rte_eth_dev *dev, 
uint16_t queue_idx)
 
PMD_INIT_FUNC_TRACE();
 
+   if (vtpci_packed_queue(hw)) {
+   struct vring_desc_packed *desc;
+   struct vq_desc_extra *dxp;
+
+   for (desc_idx = 0; desc_idx < vq->vq_nentries;
+   desc_idx++) {
+   m = rte_mbuf_raw_alloc(rxvq->mpool);
+   if (unlikely(m == NULL))
+   return -ENOMEM;
+
+   dxp = &vq->vq_descx[desc_idx];
+   dxp->cookie = m;
+   dxp->ndescs = 1;
+
+   desc = &vq->vq_ring.desc_packed[desc_idx];
+   desc->addr = VIRTIO_MBUF_ADDR(m, vq) +
+   RTE_PKTMBUF_HEADROOM - hw->vtnet_hdr_size;
+   desc->len = m->buf_len - RTE_PKTMBUF_HEADROOM +
+   hw->vtnet_hdr_size;
+   desc->flags |= VRING_DESC_F_WRITE;
+   rte_smp_wmb();
+   set_desc_avail(&vq->vq_ring, desc);
+   }
+   vq->vq_ring.avail_wrap_counter ^= 1;
+   nbufs = desc_idx;
+   goto out;
+   }
+
/* Allocate blank mbufs for the each rx descriptor */
nbufs = 0;
 
@@ -773,6 +802,7 @@ virtio_dev_rx_queue_setup_finish(struct rte_eth_dev *dev, 
uint16_t queue_idx)
vq_update_avail_idx(vq);
}
 
+out:
PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs);
 
VIRTQUEUE_DUMP(vq);
@@ -993,6 +1023,107 @@ virtio_rx_offload(struct rte_mbuf *m, struct 
virtio_net_hdr *hdr)
return 0;
 }
 
+uint16_t
+virtio_recv_pkts_packed(void *rx_queue, struct rte_mbuf **rx_pkts,
+uint16_t nb_pkts)
+{
+   st

[dpdk-dev] [PATCH v1 1/5] test: fix bucket size in hash table perf test

2018-09-06 Thread Yipeng Wang
The bucket size was changed from 4 to 8 but the corresponding
perf test was not changed accordingly.

Fixes: 58017c98ed53 ("hash: add vectorized comparison")
Cc: sta...@dpdk.org

Signed-off-by: Yipeng Wang 
---
 test/test/test_hash_perf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 33dcb9f..9ed7125 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -20,7 +20,7 @@
 #define MAX_ENTRIES (1 << 19)
 #define KEYS_TO_ADD (MAX_ENTRIES * 3 / 4) /* 75% table utilization */
 #define NUM_LOOKUPS (KEYS_TO_ADD * 5) /* Loop among keys added, several times 
*/
-#define BUCKET_SIZE 4
+#define BUCKET_SIZE 8
 #define NUM_BUCKETS (MAX_ENTRIES / BUCKET_SIZE)
 #define MAX_KEYSIZE 64
 #define NUM_KEYSIZES 10
-- 
2.7.4



[dpdk-dev] [PATCH v1 0/5] hash: add extendable bucket and partial-key hashing

2018-09-06 Thread Yipeng Wang
This patch set made two major improvements over the current rte_hash library.

First, it adds Extendable Bucket Table feature: a new structure that can
accommodate keys that failed to get inserted into the main hash table due to
the unlikely event of excessive hash collisions. The hash table buckets will
get extended using a linked list to host these keys. This new design will
guarantee insertion of 100% of the keys for a given hash table size with
minimal overhead. A new flag value is added for user to indicate if the
extendable bucket feature should be enabled or not. The linked list buckets is
similar concept to the extendable bucket hash table in packet framework.
In details, for insertion, the linked buckets will be used to store the keys
that fail to get in the primary and the secondary bucket and the cuckoo path
could not find an empty location for the maximum path length (small
probability). For lookup, the key is checked first in the primary, then the
secondary, then if the secondary is extended the linked list is traversed
for a possible match.

Second, the patch set changes the current hashing algorithm to be "partial-key
hashing". Partial-key hashing is the concept from Bin Fan, et al.'s paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter
Hashing". Instead of storing both 32-bit signature and alternative signature
in the bucket, we only store a small 16-bit signature and calculate the
alternative bucket index by XORing the signature with the current bucket index.
This doubles the hash table memory efficiency since now one bucket
only occupies one cache line instead of two in the original design.

Signed-off-by: Yipeng Wang 

Yipeng Wang (5):
  test: fix bucket size in hash table perf test
  test: more accurate hash table perf test output
  hash: add extendable bucket feature
  test: implement extendable bucket hash test
  hash: use partial-key hashing

 lib/librte_hash/rte_cuckoo_hash.c | 518 +++---
 lib/librte_hash/rte_cuckoo_hash.h |  11 +-
 lib/librte_hash/rte_hash.h|   3 +
 test/test/test_hash.c | 145 ++-
 test/test/test_hash_perf.c| 126 +++---
 5 files changed, 618 insertions(+), 185 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH v1 4/5] test: implement extendable bucket hash test

2018-09-06 Thread Yipeng Wang
This commit changes the current rte_hash unit test to
test the extendable table feature and performance.

Signed-off-by: Yipeng Wang 
---
 test/test/test_hash.c  | 145 +++--
 test/test/test_hash_perf.c | 114 +--
 2 files changed, 225 insertions(+), 34 deletions(-)

diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index b3db9fd..ca58755 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -660,6 +660,117 @@ static int test_full_bucket(void)
return 0;
 }
 
+/*
+ * Similar to the test above (full bucket test), but for extendable buckets.
+ */
+static int test_extendable_bucket(void)
+{
+   struct rte_hash_parameters params_pseudo_hash = {
+   .name = "test5",
+   .entries = 64,
+   .key_len = sizeof(struct flow_key), /* 13 */
+   .hash_func = pseudo_hash,
+   .hash_func_init_val = 0,
+   .socket_id = 0,
+   .extra_flag = RTE_HASH_EXTRA_FLAGS_EXT_TABLE
+   };
+   struct rte_hash *handle;
+   int pos[64];
+   int expected_pos[64];
+   unsigned int i;
+   struct flow_key rand_keys[64];
+
+   for (i = 0; i < 64; i++) {
+   rand_keys[i].port_dst = i;
+   rand_keys[i].port_src = i+1;
+   }
+
+   handle = rte_hash_create(¶ms_pseudo_hash);
+   RETURN_IF_ERROR(handle == NULL, "hash creation failed");
+
+   /* Fill bucket */
+   for (i = 0; i < 64; i++) {
+   pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+   print_key_info("Add", &rand_keys[i], pos[i]);
+   RETURN_IF_ERROR(pos[i] < 0,
+   "failed to add key (pos[%u]=%d)", i, pos[i]);
+   expected_pos[i] = pos[i];
+   }
+
+
+   /* Lookup */
+   for (i = 0; i < 64; i++) {
+   pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+   print_key_info("Lkp", &rand_keys[i], pos[i]);
+   RETURN_IF_ERROR(pos[i] != expected_pos[i],
+   "failed to find key (pos[%u]=%d)", i, pos[i]);
+   }
+
+   /* Add - update */
+   for (i = 0; i < 64; i++) {
+   pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+   print_key_info("Add", &rand_keys[i], pos[i]);
+   RETURN_IF_ERROR(pos[i] != expected_pos[i],
+   "failed to add key (pos[%u]=%d)", i, pos[i]);
+   }
+
+   /* Lookup */
+   for (i = 0; i < 64; i++) {
+   pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+   print_key_info("Lkp", &rand_keys[i], pos[i]);
+   RETURN_IF_ERROR(pos[i] != expected_pos[i],
+   "failed to find key (pos[%u]=%d)", i, pos[i]);
+   }
+
+   /* Delete 1 key, check other keys are still found */
+   pos[35] = rte_hash_del_key(handle, &rand_keys[35]);
+   print_key_info("Del", &rand_keys[35], pos[35]);
+   RETURN_IF_ERROR(pos[35] != expected_pos[35],
+   "failed to delete key (pos[1]=%d)", pos[35]);
+   pos[20] = rte_hash_lookup(handle, &rand_keys[20]);
+   print_key_info("Lkp", &rand_keys[20], pos[20]);
+   RETURN_IF_ERROR(pos[20] != expected_pos[20],
+   "failed lookup after deleting key from same bucket "
+   "(pos[20]=%d)", pos[20]);
+
+   /* Go back to previous state */
+   pos[35] = rte_hash_add_key(handle, &rand_keys[35]);
+   print_key_info("Add", &rand_keys[35], pos[35]);
+   expected_pos[35] = pos[35];
+   RETURN_IF_ERROR(pos[35] < 0, "failed to add key (pos[1]=%d)", pos[35]);
+
+   /* Delete */
+   for (i = 0; i < 64; i++) {
+   pos[i] = rte_hash_del_key(handle, &rand_keys[i]);
+   print_key_info("Del", &rand_keys[i], pos[i]);
+   RETURN_IF_ERROR(pos[i] != expected_pos[i],
+   "failed to delete key (pos[%u]=%d)", i, pos[i]);
+   }
+
+   /* Lookup */
+   for (i = 0; i < 64; i++) {
+   pos[i] = rte_hash_lookup(handle, &rand_keys[i]);
+   print_key_info("Lkp", &rand_keys[i], pos[i]);
+   RETURN_IF_ERROR(pos[i] != -ENOENT,
+   "fail: found non-existent key (pos[%u]=%d)", i, pos[i]);
+   }
+
+   /* Add again */
+   for (i = 0; i < 64; i++) {
+   pos[i] = rte_hash_add_key(handle, &rand_keys[i]);
+   print_key_info("Add", &rand_keys[i], pos[i]);
+   RETURN_IF_ERROR(pos[i] < 0,
+   "failed to add key (pos[%u]=%d)", i, pos[i]);
+   expected_pos[i] = pos[i];
+   }
+
+   rte_hash_free(handle);
+
+   /* Cover the NULL case. */
+   rte_hash_free(0);
+   return 0;
+}
+
 
/**/
 static int
 fbk_hash_unit_test(void)
@@ -1096,7 +1207,7 @@ test_hash_creation_with_good_parameters(void)
  *

[dpdk-dev] [PATCH v1 2/5] test: more accurate hash table perf test output

2018-09-06 Thread Yipeng Wang
Edit the printf information when error happens to be more
accurate and informative.

Signed-off-by: Yipeng Wang 
---
 test/test/test_hash_perf.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/test/test/test_hash_perf.c b/test/test/test_hash_perf.c
index 9ed7125..4d00c20 100644
--- a/test/test/test_hash_perf.c
+++ b/test/test/test_hash_perf.c
@@ -248,7 +248,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned 
table_index)
(const void *) keys[i],
signatures[i], data);
if (ret < 0) {
-   printf("Failed to add key number %u\n", ret);
+   printf("H+D: Failed to add key number %u\n", i);
return -1;
}
} else if (with_hash && !with_data) {
@@ -258,7 +258,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned 
table_index)
if (ret >= 0)
positions[i] = ret;
else {
-   printf("Failed to add key number %u\n", ret);
+   printf("H: Failed to add key number %u\n", i);
return -1;
}
} else if (!with_hash && with_data) {
@@ -266,7 +266,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned 
table_index)
(const void *) keys[i],
data);
if (ret < 0) {
-   printf("Failed to add key number %u\n", ret);
+   printf("D: Failed to add key number %u\n", i);
return -1;
}
} else {
@@ -274,7 +274,7 @@ timed_adds(unsigned with_hash, unsigned with_data, unsigned 
table_index)
if (ret >= 0)
positions[i] = ret;
else {
-   printf("Failed to add key number %u\n", ret);
+   printf("Failed to add key number %u\n", i);
return -1;
}
}
@@ -442,7 +442,7 @@ timed_deletes(unsigned with_hash, unsigned with_data, 
unsigned table_index)
if (ret >= 0)
positions[i] = ret;
else {
-   printf("Failed to add key number %u\n", ret);
+   printf("Failed to delete key number %u\n", i);
return -1;
}
}
-- 
2.7.4



[dpdk-dev] [PATCH v1 5/5] hash: use partial-key hashing

2018-09-06 Thread Yipeng Wang
This commit changes the hashing mechanism to "partial-key
hashing" to calculate bucket index and signature of key.

This is  proposed in Bin Fan, et al's paper
"MemC3: Compact and Concurrent MemCache with Dumber Caching
and Smarter Hashing". Bascially the idea is to use "xor" to
derive alternative bucket from current bucket index and
signature.

With "partial-key hashing", it reduces the bucket memory
requirement from two cache lines to one cache line, which
improves the memory efficiency and thus the lookup speed.

Signed-off-by: Yipeng Wang 
---
 lib/librte_hash/rte_cuckoo_hash.c | 225 ++
 lib/librte_hash/rte_cuckoo_hash.h |   6 +-
 2 files changed, 108 insertions(+), 123 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index ff380bb..ace47ad 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -92,6 +92,26 @@ rte_hash_cmp_eq(const void *key1, const void *key2, const 
struct rte_hash *h)
return cmp_jump_table[h->cmp_jump_table_idx](key1, key2, 
h->key_len);
 }
 
+static inline void
+get_buckets_index(const struct rte_hash *h, const hash_sig_t hash,
+   uint32_t *prim_bkt, uint32_t *sec_bkt, uint16_t *sig)
+{
+   /*
+* We use higher 16 bits of hash as the signature value stored in table.
+* We use the lower bits for the primary bucket
+* location. Then we xor primary bucket location and the signature
+* to get the secondary bucket location. This is same as
+* proposed in paper" B. Fan, et al's paper
+* "Cuckoo Filter: Practically Better Than Bloom". The benefit to use
+* xor is that one could derive the alternative bucket location
+* by only using the current bucket location and the signature.
+*/
+   *sig = hash >> 16;
+
+   *prim_bkt = hash & h->bucket_bitmask;
+   *sec_bkt =  (*prim_bkt ^ *sig) & h->bucket_bitmask;
+}
+
 struct rte_hash *
 rte_hash_create(const struct rte_hash_parameters *params)
 {
@@ -329,9 +349,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
h->ext_table_support = ext_table_support;
 
 #if defined(RTE_ARCH_X86)
-   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
-   h->sig_cmp_fn = RTE_HASH_COMPARE_AVX2;
-   else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
+   if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2))
h->sig_cmp_fn = RTE_HASH_COMPARE_SSE;
else
 #endif
@@ -418,18 +436,6 @@ rte_hash_hash(const struct rte_hash *h, const void *key)
return h->hash_func(key, h->key_len, h->hash_func_init_val);
 }
 
-/* Calc the secondary hash value from the primary hash value of a given key */
-static inline hash_sig_t
-rte_hash_secondary_hash(const hash_sig_t primary_hash)
-{
-   static const unsigned all_bits_shift = 12;
-   static const unsigned alt_bits_xor = 0x5bd1e995;
-
-   uint32_t tag = primary_hash >> all_bits_shift;
-
-   return primary_hash ^ ((tag + 1) * alt_bits_xor);
-}
-
 int32_t
 rte_hash_count(const struct rte_hash *h)
 {
@@ -561,14 +567,13 @@ enqueue_slot_back(const struct rte_hash *h,
 /* Search a key from bucket and update its data */
 static inline int32_t
 search_and_update(const struct rte_hash *h, void *data, const void *key,
-   struct rte_hash_bucket *bkt, hash_sig_t sig, hash_sig_t alt_hash)
+   struct rte_hash_bucket *bkt, uint16_t sig)
 {
int i;
struct rte_hash_key *k, *keys = h->key_store;
 
for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-   if (bkt->sig_current[i] == sig &&
-   bkt->sig_alt[i] == alt_hash) {
+   if (bkt->sig_current[i] == sig) {
k = (struct rte_hash_key *) ((char *)keys +
bkt->key_idx[i] * h->key_entry_size);
if (rte_hash_cmp_eq(key, k->key, h) == 0) {
@@ -595,7 +600,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
struct rte_hash_bucket *prim_bkt,
struct rte_hash_bucket *sec_bkt,
const struct rte_hash_key *key, void *data,
-   hash_sig_t sig, hash_sig_t alt_hash, uint32_t new_idx,
+   uint16_t sig, uint32_t new_idx,
int32_t *ret_val)
 {
unsigned int i;
@@ -606,7 +611,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
/* Check if key was inserted after last check but before this
 * protected region in case of inserting duplicated keys.
 */
-   ret = search_and_update(h, data, key, prim_bkt, sig, alt_hash);
+   ret = search_and_update(h, data, key, prim_bkt, sig);
if (ret != -1) {
__hash_rw_writer_unlock(h);
*ret_val = ret;
@@ -614,7 +619,7 @@ rte_hash_cuckoo_insert_mw(const struct rte_hash *h,
}
 
FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
-   ret = se

[dpdk-dev] [PATCH v1 3/5] hash: add extendable bucket feature

2018-09-06 Thread Yipeng Wang
In use cases that hash table capacity needs to be guaranteed,
the extendable bucket feature can be used to contain extra
keys in linked lists when conflict happens. This is similar
concept to the extendable bucket hash table in packet
framework.

This commit adds the extendable bucket feature. User can turn
it on or off through the extra flag field during table
creation time.

Extendable bucket table composes of buckets that can be
linked list to current main table. When extendable bucket
is enabled, the table utilization can always acheive 100%.
Although keys ending up in the ext buckets may have longer
look up time, they should be rare due to the cuckoo
algorithm.

Signed-off-by: Yipeng Wang 
---
 lib/librte_hash/rte_cuckoo_hash.c | 331 +-
 lib/librte_hash/rte_cuckoo_hash.h |   5 +
 lib/librte_hash/rte_hash.h|   3 +
 3 files changed, 298 insertions(+), 41 deletions(-)

diff --git a/lib/librte_hash/rte_cuckoo_hash.c 
b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8..ff380bb 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -31,6 +31,10 @@
 #include "rte_hash.h"
 #include "rte_cuckoo_hash.h"
 
+#define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)\
+   for (CURRENT_BKT = START_BUCKET;  \
+   CURRENT_BKT != NULL;  \
+   CURRENT_BKT = CURRENT_BKT->next)
 
 TAILQ_HEAD(rte_hash_list, rte_tailq_entry);
 
@@ -63,6 +67,16 @@ rte_hash_find_existing(const char *name)
return h;
 }
 
+static inline struct rte_hash_bucket *
+rte_hash_get_last_bkt(struct rte_hash_bucket *lst_bkt)
+{
+   while (1) {
+   if (lst_bkt->next == NULL)
+   return lst_bkt;
+   lst_bkt = lst_bkt->next;
+   }
+}
+
 void rte_hash_set_cmp_func(struct rte_hash *h, rte_hash_cmp_eq_t func)
 {
h->cmp_jump_table_idx = KEY_CUSTOM;
@@ -85,13 +99,17 @@ rte_hash_create(const struct rte_hash_parameters *params)
struct rte_tailq_entry *te = NULL;
struct rte_hash_list *hash_list;
struct rte_ring *r = NULL;
+   struct rte_ring *r_ext = NULL;
char hash_name[RTE_HASH_NAMESIZE];
void *k = NULL;
void *buckets = NULL;
+   void *buckets_ext = NULL;
char ring_name[RTE_RING_NAMESIZE];
+   char ext_ring_name[RTE_RING_NAMESIZE];
unsigned num_key_slots;
unsigned i;
unsigned int hw_trans_mem_support = 0, multi_writer_support = 0;
+   unsigned int ext_table_support = 0;
unsigned int readwrite_concur_support = 0;
 
rte_hash_function default_hash_func = (rte_hash_function)rte_jhash;
@@ -124,6 +142,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
multi_writer_support = 1;
}
 
+   if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_EXT_TABLE)
+   ext_table_support = 1;
+
/* Store all keys and leave the first entry as a dummy entry for 
lookup_bulk */
if (multi_writer_support)
/*
@@ -145,6 +166,24 @@ rte_hash_create(const struct rte_hash_parameters *params)
goto err;
}
 
+   const uint32_t num_buckets = rte_align32pow2(params->entries) /
+   RTE_HASH_BUCKET_ENTRIES;
+
+   snprintf(ext_ring_name, sizeof(ext_ring_name), "HT_EXT_%s",
+   params->name);
+   /* Create ring for extendable buckets. */
+   if (ext_table_support) {
+   r_ext = rte_ring_create(ext_ring_name,
+   rte_align32pow2(num_buckets + 1),
+   params->socket_id, 0);
+
+   if (r_ext == NULL) {
+   RTE_LOG(ERR, HASH, "ext buckets memory allocation "
+   "failed\n");
+   goto err;
+   }
+   }
+
snprintf(hash_name, sizeof(hash_name), "HT_%s", params->name);
 
rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
@@ -177,18 +216,34 @@ rte_hash_create(const struct rte_hash_parameters *params)
goto err_unlock;
}
 
-   const uint32_t num_buckets = rte_align32pow2(params->entries)
-   / RTE_HASH_BUCKET_ENTRIES;
-
buckets = rte_zmalloc_socket(NULL,
num_buckets * sizeof(struct rte_hash_bucket),
RTE_CACHE_LINE_SIZE, params->socket_id);
 
if (buckets == NULL) {
-   RTE_LOG(ERR, HASH, "memory allocation failed\n");
+   RTE_LOG(ERR, HASH, "buckets memory allocation failed\n");
goto err_unlock;
}
 
+   /* Allocate same number of extendable buckets */
+   if (ext_table_support) {
+   buckets_ext = rte_zmalloc_socket(NU

[dpdk-dev] [PATCH] vhost: fix crash on unregistering in client mode

2018-09-06 Thread Qiang Zhou
when rte_vhost_driver_unregister delete the connection fd,
the fd lock will prevent the vsocket to be freed. But when
vhost_user_msg_handler return error, it will delete vsocket
conn_list. And then the fd lock will become invalid. So the
vsocket will be freed in rte_vhost_drivere_unregister and
the vhost_user_read_cb will reconnect.

To fix this:
move delete vsocket conn after reconnect

Cc: sta...@dpdk.org

Signed-off-by: Qiang Zhou 
---
 lib/librte_vhost/socket.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
index d63031747..43da1c51b 100644
--- a/lib/librte_vhost/socket.c
+++ b/lib/librte_vhost/socket.c
@@ -293,16 +293,16 @@ vhost_user_read_cb(int connfd, void *dat, int *remove)
if (vsocket->notify_ops->destroy_connection)
vsocket->notify_ops->destroy_connection(conn->vid);
 
+   if (vsocket->reconnect) {
+   create_unix_socket(vsocket);
+   vhost_user_start_client(vsocket);
+   }
+
pthread_mutex_lock(&vsocket->conn_mutex);
TAILQ_REMOVE(&vsocket->conn_list, conn, next);
pthread_mutex_unlock(&vsocket->conn_mutex);
 
free(conn);
-
-   if (vsocket->reconnect) {
-   create_unix_socket(vsocket);
-   vhost_user_start_client(vsocket);
-   }
}
 }
 
-- 
2.14.3 (Apple Git-98)



Re: [dpdk-dev] [PATCH v1 2/2] examples/vdpa: add a new sample for vdpa

2018-09-06 Thread Ye Xiaolong
Hi, Rosen.

Thanks a lot for your comments.

On 09/06, Rami Rosen wrote:
>Hi all,
>First, thanks for the vdpa example patches.
>Second, I am getting a compilation error under Ubuntu 18.04, with gcc
>version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)
>...
>  CC main.o
>/work/src/dpdk/examples/vdpa/main.c: In function ‘main’:
>/work/src/dpdk/examples/vdpa/main.c:321:5: error: ignoring return
>value of ‘scanf’, declared with attribute warn_unused_result
>[-Werror=unused-result]
> scanf("%c", &ch);
> ^~~~
>cc1: all warnings being treated as errors
>/work/src/dpdk/mk/internal/rte.compile-pre.mk:114: recipe for target
>'main.o' failed
>make[1]: *** [main.o] Error 1
>

I'll look into it and solve it in v2.

>Also, it would be nice to have as part of this patch series adding the
>relevant info in
>MAINTAINERS, doc/guides/sample_app_ug/index.rst , examples/Makefile
>and adding a doc/guides/sample_app_ug/vdpa.rst, like most  patches for
>examples do.
>See for example,
>
>commit f5188211c721688bf8530d1648d623205246e1da
>Author: Fan Zhang 
>Date:   Thu Apr 5 17:01:36 2018 +0100
>examples/vhost_crypto: add sample application
>

Got it, I'll add necessary documentations accordingly.

Thanks,
Xiaolong

>Regards,
>Rami Rosen


[dpdk-dev] [PATCH] examples/ipsec-secgw: fix wrong session size

2018-09-06 Thread Anoob Joseph
Crypto devices, which support lookaside protocol, exposes security
session size in addition to the crypto private symmetric session data
size. For applications using the security capabilities, both these
sizes need to be considered.

Fixes: ec17993a145a ("examples/ipsec-secgw: support security offload")

Signed-off-by: Anoob Joseph 
Signed-off-by: Archana Muniganti 
---
 examples/ipsec-secgw/ipsec-secgw.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/examples/ipsec-secgw/ipsec-secgw.c 
b/examples/ipsec-secgw/ipsec-secgw.c
index b45b87b..47ac26a 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -1392,9 +1392,27 @@ cryptodevs_init(void)
 
uint32_t max_sess_sz = 0, sess_sz;
for (cdev_id = 0; cdev_id < rte_cryptodev_count(); cdev_id++) {
+   void *sec_ctx;
+
+   /* Get crypto priv session size */
sess_sz = rte_cryptodev_sym_get_private_session_size(cdev_id);
if (sess_sz > max_sess_sz)
max_sess_sz = sess_sz;
+
+   /*
+* If crypto device is security capable, need to check the
+* size of security session as well.
+*/
+
+   /* Get security context of the crypto device */
+   sec_ctx = rte_cryptodev_get_sec_ctx(cdev_id);
+   if (sec_ctx == NULL)
+   continue;
+
+   /* Get size of security session */
+   sess_sz = rte_security_session_get_size(sec_ctx);
+   if (sess_sz > max_sess_sz)
+   max_sess_sz = sess_sz;
}
RTE_ETH_FOREACH_DEV(port_id) {
void *sec_ctx;
-- 
2.7.4