[dpdk-dev] dpdk-i40e could not receive vlan packet whose ip_len was bigger than 1496

2019-07-09 Thread tb.bingel
Hi all,


I found dpdk-i40e could not receive vlan packet whose ip_len was bigger than 
1496, 


and I tested the same test cases with i350(igb)  network card, NO such issue.


Was it a bug in DPDK? Thanks!


Test Environment??
* netcard : Ethernet controller: Intel Corporation Ethernet Controller XL710 
for 40GbE QSFP
* CPU : Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
* OS : Ubuntu 16.04 
* DPDK?? 19.05


How to reproduce??

Step 1:  start kni example
$ sudo ./build/kni -l 0-5 -n 4 -- -P -p 0x1 -m --config="(0,1,3)" 
EAL: Detected 48 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Probing VFIO support...
EAL: PCI device :19:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device :19:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device :19:00.2 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device :19:00.3 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device :3b:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1583 net_i40e
EAL: PCI device :3b:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1583 net_i40e
APP: Initialising port 0 ...
 
Checking link status
..done
Port0 Link Up - speed 4Mbps - full-duplex
APP: 
APP: KNI Running
APP: kill -SIGUSR1 90114
APP: Show KNI Statistics.
APP: kill -SIGUSR2 90114
APP: Zero KNI Statistics.
APP: 
APP: Lcore 1 is reading from port 0
APP: Lcore 2 has nothing to do
APP: Lcore 3 is writing to port 0
APP: Lcore 4 has nothing to do
APP: Lcore 0 has nothing to do
APP: Lcore 5 has nothing to do
APP: Configure network interface of 0 up
APP: vEth0 NIC Link is Up 4 Mbps (AutoNeg) Full Duplex.



Step 2: configure KNI interface WITHOUT VLAN
$ sudo ifconfig vEth0 192.168.99.141/24


Step 3: configure KNI interface WITH VLAN ID

$ sudo vconfig add vEth0 589
Added VLAN with VID == 589 to IF -:vEth0:-
$ sudo ifconfig vEth0 192.168.100.141/24
 

Step 4??ping KNI IP(no VLAN) from antoher PC
$ ping -c 1 -s 1472 -M do 192.168.99.141 
PING 192.168.99.141 (192.168.99.141) 1472(1500) bytes of data.   It's OK 
for ip payload 1500
1480 bytes from 192.168.99.141: icmp_seq=1 ttl=64 time=7.53 ms


Step 5: ping KNI VLAN IP from anther PC
$ ping -c 1 -s 1468 -M do 192.168.100.141  
PING 192.168.100.141 (192.168.100.141) 1468(1496) bytes of data.    It's OK 
for ip payload 1496
1476 bytes from 192.168.100.141: icmp_seq=1 ttl=64 time=4.66 ms



$ ping -c 1 -s 1469 -M do 192.168.100.141  
PING 192.168.100.141 (192.168.100.141) 1469(1497) bytes of data.   <<< failed 
for ip payload 1497
^C
--- 192.168.100.141 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms




Re: [dpdk-dev] [RFC] ethdev: support input set change by RSS action

2019-07-09 Thread Jerin Jacob Kollanukkaran
> -Original Message-
> From: dev  On Behalf Of Zhang, Qi Z
> Sent: Tuesday, July 9, 2019 11:11 AM
> To: Adrien Mazarguil 
> Cc: Su, Simei ; Wu, Jingjing ;
> Xing, Beilei ; Yang, Qiming ;
> dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC] ethdev: support input set change by RSS action
> > Right, I had symmetric Toeplitz in mind and wondered what would happen
> > when users would not select the required fields. I guess the PMD would
> > have to reject unsupported combinations.
> >
> > > anything I missed, would you share more insight?
> >
> > No, that answers the question, thanks.
> >
> > Now what about my suggestion below? In short: extending ETH_RSS_*
> > assuming there's enough bits left in there, instead of adding a whole
> > new framework and breaking ABI in the process.
> 
> Since the hardware can support any combination of input set (not just 5
> tuples), we'd like to make it more generic.
> Those will be some pain due to ABI break, but we think its worth for long
> term.

# IMO, The decided method should be compatible with normal RSS(without 
rte_flow) case 
as well(Currently is struct rte_eth_conf.rx_adv_conf.rss_conf.rss_hf used 
in the dev_configure()) as from HW perspective it will be same  the HW block 
which does 
both.
# From Marvell HW point of view, We can support any combination of input set.
But it is more of a micro code(i.e need to first provide the list protocol 
supported to HW)
and then use it latter. So, I think, extending the ETH_RSS_* would be more 
portable
approach __without loosing__ the functionality. Since there is around 40 bits 
are left, I think
more standard protocol can fit in the 40 bits. 






[dpdk-dev] [PATCH v4] net/memif: zero-copy slave

2019-07-09 Thread Jakub Grajciar
Zero-copy slave support for memif PMD.
Slave interface exposes DPDK memory to
master interface. Only single file segments
are supported (EAL option --single-file-segments).

Signed-off-by: Jakub Grajciar 
---
 doc/guides/nics/memif.rst |  29 ++
 drivers/net/memif/Makefile|   1 +
 drivers/net/memif/memif_socket.c  |  63 +--
 drivers/net/memif/meson.build |   1 +
 drivers/net/memif/rte_eth_memif.c | 451 +-
 drivers/net/memif/rte_eth_memif.h |   9 +-
 lib/librte_eal/common/eal_common_mcfg.c   |   7 +
 .../common/include/rte_eal_memconfig.h|  10 +
 lib/librte_eal/rte_eal_version.map|   1 +
 9 files changed, 503 insertions(+), 69 deletions(-)

V2:
- fix coding style

V3:
- fix compilation issues

V4:
- don't move existing code
- add new EAL API rte_mcfg_get_single_file_segments,
  mem_config is now private, this api returns
  single_file_segments parameter value

diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
index de2d481eb..46cadb13f 100644
--- a/doc/guides/nics/memif.rst
+++ b/doc/guides/nics/memif.rst
@@ -171,6 +171,35 @@ Files
 - net/memif/memif.h *- descriptor and ring definitions*
 - net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()*

+Zero-copy slave
+~~~
+
+**Shared memory format**
+
+Region 0 is created by memif driver and contains rings. Slave interface 
exposes DPDK memory (memseg).
+Instead of using memfd_create() to create new shared file, existing memsegs 
are used.
+Master interface functions the same as with zero-copy disabled.
+
+region 0:
+
++---+
+| Rings |
++---+---+
+| S2M rings | M2S rings |
++---+---+
+
+region n:
+
++-+
+| Buffers |
++-+
+|memseg   |
++-+
+
+Buffers are dequeued and enqueued as needed. Offset descriptor field is 
calculated at tx.
+Only single file segments mode (EAL option --single-file-segments) is 
supported, as calculating
+offset from multiple segments is too expensive.
+
 Example: testpmd
 
 In this example we run two instances of testpmd application and transmit 
packets over memif.
diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile
index fdbdf3378..8f5c6dd0d 100644
--- a/drivers/net/memif/Makefile
+++ b/drivers/net/memif/Makefile
@@ -20,6 +20,7 @@ CFLAGS += -DALLOW_EXPERIMENTAL_API
 # - rte_mp_action_register
 # - rte_mp_reply
 # - rte_mp_request_sync
+# - rte_mcfg_get_single_file_segments
 LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
 LDLIBS += -lrte_ethdev -lrte_kvargs
 LDLIBS += -lrte_hash
diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c
index 01a935f87..2bd7f2927 100644
--- a/drivers/net/memif/memif_socket.c
+++ b/drivers/net/memif/memif_socket.c
@@ -176,8 +176,7 @@ memif_msg_receive_hello(struct rte_eth_dev *dev, 
memif_msg_t *msg)

strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name));

-   MIF_LOG(DEBUG, "%s: Connecting to %s.",
-   rte_vdev_device_name(pmd->vdev), pmd->remote_name);
+   MIF_LOG(DEBUG, "Connecting to %s.", pmd->remote_name);

return 0;
 }
@@ -339,8 +338,7 @@ memif_msg_receive_connect(struct rte_eth_dev *dev, 
memif_msg_t *msg)

strlcpy(pmd->remote_if_name, (char *)c->if_name,
sizeof(pmd->remote_if_name));
-   MIF_LOG(INFO, "%s: Remote interface %s connected.",
-   rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+   MIF_LOG(INFO, "Remote interface %s connected.", pmd->remote_if_name);

return 0;
 }
@@ -358,8 +356,7 @@ memif_msg_receive_connected(struct rte_eth_dev *dev, 
memif_msg_t *msg)

strlcpy(pmd->remote_if_name, (char *)c->if_name,
sizeof(pmd->remote_if_name));
-   MIF_LOG(INFO, "%s: Remote interface %s connected.",
-   rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+   MIF_LOG(INFO, "Remote interface %s connected.", pmd->remote_if_name);

return 0;
 }
@@ -370,14 +367,13 @@ memif_msg_receive_disconnect(struct rte_eth_dev *dev, 
memif_msg_t *msg)
struct pmd_internals *pmd = dev->data->dev_private;
memif_msg_disconnect_t *d = &msg->disconnect;

-   memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+   memset(pmd->remote_disc_string, 0, sizeof(pmd->remote_disc_string));
strlcpy(pmd->remote_disc_string, (char *)d->string,
sizeof(pmd->remote_disc_string));

-   MIF_LOG(INFO, "%s: Disconnect received: %s",
-   rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string);
+   MIF_LOG(INFO, "Disconnect received: %s", pmd->remote_disc_string);

-   memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+   memset(pmd->local_disc_string, 0, 96);
memif_disconnect(dev);
return 0;
 }
@@ -473,7 +

Re: [dpdk-dev] [PATCH] ethdev: add flow tag

2019-07-09 Thread Adrien Mazarguil
On Fri, Jul 05, 2019 at 06:05:50PM +, Yongseok Koh wrote:
> > On Jul 5, 2019, at 6:54 AM, Adrien Mazarguil  
> > wrote:
> > 
> > On Thu, Jul 04, 2019 at 04:23:02PM -0700, Yongseok Koh wrote:
> >> A tag is a transient data which can be used during flow match. This can be
> >> used to store match result from a previous table so that the same pattern
> >> need not be matched again on the next table. Even if outer header is
> >> decapsulated on the previous match, the match result can be kept.
> >> 
> >> Some device expose internal registers of its flow processing pipeline and
> >> those registers are quite useful for stateful connection tracking as it
> >> keeps status of flow matching. Multiple tags are supported by specifying
> >> index.
> >> 
> >> Example testpmd commands are:
> >> 
> >>  flow create 0 ingress pattern ... / end
> >>actions set_tag index 2 value 0xaa00bb mask 0x00ff /
> >>set_tag index 3 value 0x123456 mask 0xff /
> >>vxlan_decap / jump group 1 / end
> >> 
> >>  flow create 0 ingress pattern ... / end
> >>actions set_tag index 2 value 0xcc00 mask 0xff00 /
> >>set_tag index 3 value 0x123456 mask 0xff /
> >>vxlan_decap / jump group 1 / end
> >> 
> >>  flow create 0 ingress group 1
> >>pattern tag index is 2 value spec 0xaa00bb value mask 0x00ff /
> >>eth ... / end
> >>actions ... jump group 2 / end
> >> 
> >>  flow create 0 ingress group 1
> >>pattern tag index is 2 value spec 0xcc00 value mask 0xff00 /
> >>tag index is 3 value spec 0x123456 value mask 0xff /
> >>eth ... / end
> >>actions ... / end
> >> 
> >>  flow create 0 ingress group 2
> >>pattern tag index is 3 value spec 0x123456 value mask 0xff /
> >>eth ... / end
> >>actions ... / end
> >> 
> >> Signed-off-by: Yongseok Koh 
> > 
> > Hi Yongseok,
> > 
> > Only high level questions for now, while it unquestionably looks useful,
> > from a user standpoint exposing the separate index seems redundant and not
> > necessarily convenient. Using the following example to illustrate:
> > 
> > actions set_tag index 3 value 0x123456 mask 0xf
> > 
> > pattern tag index is 3 value spec 0x123456 value mask 0xff
> > 
> > I might be missing something, but why isn't this enough:
> > 
> > pattern tag index is 3 # match whatever is stored at index 3
> > 
> > Assuming it can work, then why bother with providing value spec/mask on
> > set_tag? A flow rule pattern matches something, sets some arbitrary tag to
> > be matched by a subsequent flow rule and that's it. It even seems like
> > relying on the index only on both occasions is enough for identification.
> > 
> > Same question for the opposite approach; relying on the value, never
> > mentioning the index.
> > 
> > I'm under the impression that the index is a hardware-specific constraint
> > that shouldn't be exposed (especially since it's an 8-bit field). If so, a
> > PMD could keep track of used indices without having them exposed through the
> > public API.
> 
> 
> Thank you for review, Adrien.
> Hope you are doing well. It's been long since we talked each other. :-)

Yeah clearly! Hope you're doing well too. I'm somewhat busy hence slow to
answer these days...

  hey!
  no private talks!

Back to the topic:

> Your approach will work too in general but we have a request from customer 
> that
> they want to partition this limited tag storage. Assuming that HW exposes 
> 32bit
> tags (those are 'registers' in HW pipeline in mlx5 HW). Then, customers want 
> to
> store multiple data even in a 32-bit storage. For example, 16bit vlan tag, 
> 8bit
> table id and 8bit flow id. As they want to split one 32bit storage, I thought 
> it
> is better to provide mask when setting/matching the value. Even some customer
> wants to store multiple flags bit by bit like ol_flags. They do want to alter
> only partial bits.
> 
> And for the index, it is to reference an entry of tags array as HW can provide
> larger registers than 32-bit. For example, mlx5 HW would provide 4 of 32b
> storage which users can use for their own sake.
>   tag[0], tag[1], tag[2], tag[3]

OK, looks like I missed the point then. I initially took it for a funky
alternative to RTE_FLOW_ITEM_TYPE_META & RTE_FLOW_ACTION_TYPE_SET_META
(ingress extended [1]) but while it could be used like that, it's more of a
way to temporarily store and retrieve a small amount of data, correct?

Out of curiosity, are these registers independent from META and other
items/actions in mlx5, otherwise what happens if they are combined?

Are there other uses for these registers? Say, referencing their contents
from other places in a flow rule so they don't have to be hard-coded?

Right now I'm still uncomfortable with such a feature in the public API
because compared to META [1], this approach looks very hardware-specific and
seemingly difficult to map on different HW architectures.

However, the main prob

Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Bruce Richardson
On Mon, Jul 08, 2019 at 07:13:28PM +, Jerin Jacob Kollanukkaran wrote:
> See below,
> 
> Please send the email as text to avoid formatting issue.(No HTML)
> 
> From: David Marchand  
> Sent: Tuesday, July 9, 2019 12:09 AM
> To: Jerin Jacob Kollanukkaran 
> Cc: dev ; Thomas Monjalon ; Ben Walker 
> ; Burakov, Anatoly 
> Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode selection
> 
> 
> 
> On Mon, Jul 8, 2019 at 4:25 PM  wrote:
> From: Jerin Jacob 
> 
> Existing logic fails to select IOVA mode as VA
> if driver request to enable IOVA as VA.
> 
> IOVA as VA has more strict requirement than other modes,
> so enabling positive logic for IOVA as VA selection.
> 
> This patch also updates the default IOVA mode as PA
> for PCI devices as it has to deal with DMA engines unlike
> the virtual devices that may need only IOVA as DC.
> 
> We have three cases:
> - driver/hw supports IOVA as PA only
> 
> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non  IOMMU). 
> We are already addressing that case
> 

Not necessarily. It's possible to have hardware that does not use the IOMMU
on a platform. Therefore, you have more than two options to support.

/Bruce


[dpdk-dev] [PATCH v3] app/pdump: enforcing pdump to use sw mempool

2019-07-09 Thread Harman Kalra
A secondary process cannot access HW mempool already
initiazed by primary, and neither it can setup its own
HW mempool due to its own restrictions.

Since dpdk-pdump creates mempool for managing its local
mbufs, SW mempool is capable enough to solve this purpose.

Signed-off-by: Harman Kalra 
---
 app/pdump/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/app/pdump/main.c b/app/pdump/main.c
index 80dc924cf..cd0986aee 100644
--- a/app/pdump/main.c
+++ b/app/pdump/main.c
@@ -604,11 +604,11 @@ create_mp_ring_vdev(void)
mbuf_pool = rte_mempool_lookup(mempool_name);
if (mbuf_pool == NULL) {
/* create mempool */
-   mbuf_pool = rte_pktmbuf_pool_create(mempool_name,
+   mbuf_pool = rte_pktmbuf_pool_create_by_ops(mempool_name,
pt->total_num_mbufs,
MBUF_POOL_CACHE_SIZE, 0,
pt->mbuf_data_size,
-   rte_socket_id());
+   rte_socket_id(), "ring_mp_mc");
if (mbuf_pool == NULL) {
cleanup_rings();
rte_exit(EXIT_FAILURE,
-- 
2.18.0



[dpdk-dev] [PATCH v8 1/2] net/mlx5: support match GRE protocol on DR engine

2019-07-09 Thread Xiaoyu Min
DR engine support matching on GRE protocol field without MPLS supports.
So bypassing the MPLS check when DR is enabled.

Signed-off-by: Xiaoyu Min 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_flow.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 534cd9338e..626d0f9352 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1714,6 +1714,7 @@ mlx5_flow_validate_item_gre(const struct rte_flow_item 
*item,
 sizeof(struct rte_flow_item_gre), error);
if (ret < 0)
return ret;
+#ifndef HAVE_MLX5DV_DR
 #ifndef HAVE_IBV_DEVICE_MPLS_SUPPORT
if (spec && (spec->protocol & mask->protocol))
return rte_flow_error_set(error, ENOTSUP,
@@ -1721,6 +1722,7 @@ mlx5_flow_validate_item_gre(const struct rte_flow_item 
*item,
  "without MPLS support the"
  " specification cannot be used for"
  " filtering");
+#endif
 #endif
return 0;
 }
-- 
2.21.0



[dpdk-dev] [PATCH v8 2/2] net/mlx5: match GRE's key and present bits

2019-07-09 Thread Xiaoyu Min
support matching on the present bits (C,K,S)
as well as the optional key field.

If the rte_flow_item_gre_key is specified in pattern,
it will set K present match automatically.

Signed-off-by: Xiaoyu Min 
Acked-by: Viacheslav Ovsiienko 
---
 doc/guides/rel_notes/release_19_08.rst |  5 ++
 drivers/net/mlx5/mlx5_flow.c   | 61 ++-
 drivers/net/mlx5/mlx5_flow.h   |  5 ++
 drivers/net/mlx5/mlx5_flow_dv.c| 84 ++
 drivers/net/mlx5/mlx5_prm.h|  6 +-
 5 files changed, 159 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_19_08.rst 
b/doc/guides/rel_notes/release_19_08.rst
index 27d5915bed..dd3f2147b5 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -191,6 +191,11 @@ New Features
   Added telemetry mode to l3fwd-power application to report
   application level busyness, empty and full polls of rte_eth_rx_burst().
 
+* **Updated Mellanox mlx5 driver.**
+
+   Updated Mellanox mlx5 driver with new features and improvements, including:
+
+   * Added support for matching on GRE's key and C,K,S present bits.
 
 Removed Items
 -
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 626d0f9352..c7e034c815 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1666,6 +1666,61 @@ mlx5_flow_validate_item_vxlan_gpe(const struct 
rte_flow_item *item,
  " defined");
return 0;
 }
+/**
+ * Validate GRE Key item.
+ *
+ * @param[in] item
+ *   Item specification.
+ * @param[in] item_flags
+ *   Bit flags to mark detected items.
+ * @param[in] gre_item
+ *   Pointer to gre_item
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_validate_item_gre_key(const struct rte_flow_item *item,
+   uint64_t item_flags,
+   const struct rte_flow_item *gre_item,
+   struct rte_flow_error *error)
+{
+   const rte_be32_t *mask = item->mask;
+   int ret = 0;
+   rte_be32_t gre_key_default_mask = RTE_BE32(UINT32_MAX);
+   const struct rte_flow_item_gre *gre_spec = gre_item->spec;
+   const struct rte_flow_item_gre *gre_mask = gre_item->mask;
+
+   if (item_flags & MLX5_FLOW_LAYER_GRE_KEY)
+   return rte_flow_error_set(error, ENOTSUP,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "Multiple GRE key not support");
+   if (!(item_flags & MLX5_FLOW_LAYER_GRE))
+   return rte_flow_error_set(error, ENOTSUP,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "No preceding GRE header");
+   if (item_flags & MLX5_FLOW_LAYER_INNER)
+   return rte_flow_error_set(error, ENOTSUP,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "GRE key following a wrong item");
+   if (!gre_mask)
+   gre_mask = &rte_flow_item_gre_mask;
+   if (gre_spec && (gre_mask->c_rsvd0_ver & RTE_BE16(0x2000)) &&
+!(gre_spec->c_rsvd0_ver & RTE_BE16(0x2000)))
+   return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "Key bit must be on");
+
+   if (!mask)
+   mask = &gre_key_default_mask;
+   ret = mlx5_flow_item_acceptable
+   (item, (const uint8_t *)mask,
+(const uint8_t *)&gre_key_default_mask,
+sizeof(rte_be32_t), error);
+   return ret;
+}
 
 /**
  * Validate GRE item.
@@ -1691,6 +1746,10 @@ mlx5_flow_validate_item_gre(const struct rte_flow_item 
*item,
const struct rte_flow_item_gre *spec __rte_unused = item->spec;
const struct rte_flow_item_gre *mask = item->mask;
int ret;
+   const struct rte_flow_item_gre nic_mask = {
+   .c_rsvd0_ver = RTE_BE16(0xB000),
+   .protocol = RTE_BE16(UINT16_MAX),
+   };
 
if (target_protocol != 0xff && target_protocol != IPPROTO_GRE)
return rte_flow_error_set(error, EINVAL,
@@ -1710,7 +1769,7 @@ mlx5_flow_validate_item_gre(const struct rte_flow_item 
*item,
mask = &rte_flow_item_gre_mask;
ret = mlx5_flow_item_acceptable
(item, (const uint8_t *)mask,
-(const uint8_t *)&rte_flow_item_gre_mask,
+(const uint8_t *)&nic_mask,
 sizeof(struct rte_flow_item_gre), error);
if (ret < 0)
return ret;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 3d7fcf706e..6ccb8a7189 100644
--- a/drivers/net/mlx5/mlx5_flow.h

[dpdk-dev] [PATCH v8 0/2] match on GRE's key

2019-07-09 Thread Xiaoyu Min
This series patchs are based on RFC [1], which enable the matching on
GRE's key field.
And enabled MLX5 device supports on this.

[1] https://patches.dpdk.org/patch/53432/

---
v2:
  * remove struct rte_flow_item_gre_key in order to comply new convention
v3:
  * updated release note
  * fixed one bug
v4:
  * resend patchs in thread mode
v5:
  * report error when K is off but gre_key item present
  * removed ITEM_CRKSV, added c_bit, k_bit, s_bit in testpmd
  * document updated
v6:
  * one fix in pmd
v7:
  * addressed comments in TestPMD
v8:
  * rebased PMD patchs on v19.08-rc1
---
Xiaoyu Min (2):
  net/mlx5: support match GRE protocol on DR engine
  net/mlx5: match GRE's key and present bits

 doc/guides/rel_notes/release_19_08.rst |  5 ++
 drivers/net/mlx5/mlx5_flow.c   | 63 ++-
 drivers/net/mlx5/mlx5_flow.h   |  5 ++
 drivers/net/mlx5/mlx5_flow_dv.c| 84 ++
 drivers/net/mlx5/mlx5_prm.h|  6 +-
 5 files changed, 161 insertions(+), 2 deletions(-)

-- 
2.21.0



Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Jerin Jacob Kollanukkaran
> -Original Message-
> From: Bruce Richardson 
> Sent: Tuesday, July 9, 2019 2:10 PM
> To: Jerin Jacob Kollanukkaran 
> Cc: David Marchand ; dev ;
> Thomas Monjalon ; Ben Walker
> ; Burakov, Anatoly
> 
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode
> selection
> 
> On Mon, Jul 08, 2019 at 07:13:28PM +, Jerin Jacob Kollanukkaran wrote:
> > See below,
> >
> > Please send the email as text to avoid formatting issue.(No HTML)
> >
> > From: David Marchand 
> > Sent: Tuesday, July 9, 2019 12:09 AM
> > To: Jerin Jacob Kollanukkaran 
> > Cc: dev ; Thomas Monjalon ;
> Ben
> > Walker ; Burakov, Anatoly
> > 
> > Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> > selection
> >
> > 
> >
> > On Mon, Jul 8, 2019 at 4:25 PM  wrote:
> > From: Jerin Jacob 
> >
> > Existing logic fails to select IOVA mode as VA if driver request to
> > enable IOVA as VA.
> >
> > IOVA as VA has more strict requirement than other modes, so enabling
> > positive logic for IOVA as VA selection.
> >
> > This patch also updates the default IOVA mode as PA for PCI devices as
> > it has to deal with DMA engines unlike the virtual devices that may
> > need only IOVA as DC.
> >
> > We have three cases:
> > - driver/hw supports IOVA as PA only
> >
> > [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
> > IOMMU). We are already addressing that case
> >
> 
> Not necessarily. It's possible to have hardware that does not use the IOMMU
> on a platform. Therefore, you have more than two options to support.

Any example such device?

> 
> /Bruce


Re: [dpdk-dev] [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare exchange

2019-07-09 Thread Phil Yang (Arm Technology China)
> -Original Message-
> From: Pavan Nikhilesh Bhagavatula 
> Sent: Friday, July 5, 2019 12:37 PM
> To: Honnappa Nagarahalli ;
> jer...@marvell.com; Phil Yang (Arm Technology China)
> ; dev@dpdk.org
> Cc: tho...@monjalon.net; hemant.agra...@nxp.com; Gavin Hu (Arm
> Technology China) ; nd ;
> gage.e...@intel.com; nd 
> Subject: RE: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare
> exchange
> 
> 



> >> > > +#ifdef __ARM_FEATURE_ATOMICS
> >> > > +#define __ATOMIC128_CAS_OP(cas_op_name, op_string)
> >> \
> >> > > +static inline rte_int128_t
> >> > >   \
> >> > > +cas_op_name(rte_int128_t *dst, rte_int128_t old,  
> >> > >   \
> >> > > +  rte_int128_t updated)   
> >> > > \
> >> > > +{ 
> >> > >   \
> >> > > +  /* caspX instructions register pair must start from even-
> >numbered
> >> > > +   * register at operand 1.
> >> > > +   * So, specify registers for local variables here.
> >> > > +   */ 
> >> > > \
> >> > > +  register uint64_t x0 __asm("x0") = (uint64_t)old.val[0];
> \
> >> >
> >> > I understand CASP limitation on register has to be even and odd.
> >> > Is there anyway to remove explicit x0 register allocation and choose
> >> > compiler to decide the register. Some reason with optimize(03) gcc
> >> > makes correctly but not clang.
> >> >
> >> > Hardcoding to specific register makes compiler to not optimize the
> >> > stuff, especially if it is inline function.
> >>
> >> It look like the limitation fixed recently in gcc.
> >> https://patches.linaro.org/patch/147991/
> >>
> >> Not sure about old gcc and clang. ARM compiler experts may know
> >the exact
> >> status
> >>
> >We could use syntax as follows, an example is in [1]
> >static inline rte_int128_t
> >__rte_casp(rte_int128_t *dst, rte_int128_t old, rte_int128_t updated,
> >int mo)
> >{
> > __asm__ volatile("caspl %0, %H0, %1, %H1, [%2]"
> >  : "+r" (old)
> >  : "r" (updated), "r" (dst)
> >  : "memory");
> > return old;
> >}
> 
> We have used this format for mempool/octeontx2 but clang wasn't too
> happy.
> 
> dpdk/drivers/mempool/octeontx2/otx2_mempool_ops.c:151:15: error:
> value size does not match register size specified by the constraint and
> modifier [-Werror,-Wasm-operand-widths]
> [t0] "=&r" (t0), [t1] "=&r" (t1), [t2] "=&r" (t2),
> ^
> dpdk/drivers/mempool/octeontx2/otx2_mempool_ops.c:82:9: note: use
> constraint modifier "w"
> "casp %[t0], %H[t0], %[wdata], %H[wdata], [%[loc]]\n"
> 
> Had to change it to hand coded asm
> 
> http://patches.dpdk.org/patch/56110/

Hi Jerin,

The update from the compiler team is 'the LSE CASP fix has not been backported 
to older GCC branches'.
So, currently, this seems the only approach works for all versions of GCC and 
Clang. 
I think we can add another optimization patch for this once all the compilers 
were ready. 

Thanks,
Phil
> 
> >
> >[1] https://godbolt.org/z/EUJnuG


Re: [dpdk-dev] [PATCH] examples: modify error message for ip pipeline

2019-07-09 Thread Dumitrescu, Cristian



> -Original Message-
> From: Babu Radhakrishnan, AgalyaX
> Sent: Friday, June 14, 2019 3:06 PM
> To: dev@dpdk.org
> Cc: Pattan, Reshma ; Dumitrescu, Cristian
> ; Babu Radhakrishnan, AgalyaX
> 
> Subject: [PATCH] examples: modify error message for ip pipeline
> 
> From: Agalya Babu RadhaKrishnan 
> 
> Added help command in error message for ip pipeline commands.
> 
> Signed-off-by: Agalya Babu RadhaKrishnan
> 
> ---
>  examples/ip_pipeline/cli.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/examples/ip_pipeline/cli.c b/examples/ip_pipeline/cli.c
> index 309b2936e..8a651bbbc 100644
> --- a/examples/ip_pipeline/cli.c
> +++ b/examples/ip_pipeline/cli.c
> @@ -30,12 +30,12 @@
> 
>  #define MSG_OUT_OF_MEMORY   "Not enough memory.\n"
>  #define MSG_CMD_UNKNOWN "Unknown command \"%s\".\n"
> -#define MSG_CMD_UNIMPLEM"Command \"%s\" not implemented.\n"
> -#define MSG_ARG_NOT_ENOUGH  "Not enough arguments for command
> \"%s\".\n"
> -#define MSG_ARG_TOO_MANY"Too many arguments for command
> \"%s\".\n"
> -#define MSG_ARG_MISMATCH"Wrong number of arguments for
> command \"%s\".\n"
> -#define MSG_ARG_NOT_FOUND   "Argument \"%s\" not found.\n"
> -#define MSG_ARG_INVALID "Invalid value for argument \"%s\".\n"
> +#define MSG_CMD_UNIMPLEM"Command \"%s\" not implemented. Try
> help \n"
> +#define MSG_ARG_NOT_ENOUGH  "Not enough arguments for command
> \"%s\". Try help \n"
> +#define MSG_ARG_TOO_MANY"Too many arguments for command
> \"%s\". Try help \n"
> +#define MSG_ARG_MISMATCH"Wrong number of arguments for
> command \"%s\". Try help \n"
> +#define MSG_ARG_NOT_FOUND   "Argument \"%s\" not found. Try help
> \n"
> +#define MSG_ARG_INVALID "Invalid value for argument \"%s\". Try help
> \n"
>  #define MSG_FILE_ERR"Error in file \"%s\" at line %u.\n"
>  #define MSG_FILE_NOT_ENOUGH "Not enough rules in file \"%s\".\n"
>  #define MSG_CMD_FAIL"Command \"%s\" failed.\n"
> --
> 2.14.1

NAK

I don't see the value to add "try help command" on all the error messages, the 
user is already aware that there is a help command available, and some people 
might actually find this repetition annoying.



Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Bruce Richardson
On Tue, Jul 09, 2019 at 09:05:07AM +, Jerin Jacob Kollanukkaran wrote:
> > -Original Message-
> > From: Bruce Richardson 
> > Sent: Tuesday, July 9, 2019 2:10 PM
> > To: Jerin Jacob Kollanukkaran 
> > Cc: David Marchand ; dev ;
> > Thomas Monjalon ; Ben Walker
> > ; Burakov, Anatoly
> > 
> > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode
> > selection
> > 
> > On Mon, Jul 08, 2019 at 07:13:28PM +, Jerin Jacob Kollanukkaran wrote:
> > > See below,
> > >
> > > Please send the email as text to avoid formatting issue.(No HTML)
> > >
> > > From: David Marchand 
> > > Sent: Tuesday, July 9, 2019 12:09 AM
> > > To: Jerin Jacob Kollanukkaran 
> > > Cc: dev ; Thomas Monjalon ;
> > Ben
> > > Walker ; Burakov, Anatoly
> > > 
> > > Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> > > selection
> > >
> > > 
> > >
> > > On Mon, Jul 8, 2019 at 4:25 PM  wrote:
> > > From: Jerin Jacob 
> > >
> > > Existing logic fails to select IOVA mode as VA if driver request to
> > > enable IOVA as VA.
> > >
> > > IOVA as VA has more strict requirement than other modes, so enabling
> > > positive logic for IOVA as VA selection.
> > >
> > > This patch also updates the default IOVA mode as PA for PCI devices as
> > > it has to deal with DMA engines unlike the virtual devices that may
> > > need only IOVA as DC.
> > >
> > > We have three cases:
> > > - driver/hw supports IOVA as PA only
> > >
> > > [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
> > > IOMMU). We are already addressing that case
> > >
> > 
> > Not necessarily. It's possible to have hardware that does not use the IOMMU
> > on a platform. Therefore, you have more than two options to support.
> 
> Any example such device?
> 

On further investigation, it appears I was wrong/misinformed. All devices
I'm aware of work fine with an IOMMU if one is one the platform. Please
ignore my previous assertion, and thanks for getting me to follow up on this!

/Bruce


Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Burakov, Anatoly

On 08-Jul-19 8:13 PM, Jerin Jacob Kollanukkaran wrote:

See below,

Please send the email as text to avoid formatting issue.(No HTML)

From: David Marchand 
Sent: Tuesday, July 9, 2019 12:09 AM
To: Jerin Jacob Kollanukkaran 
Cc: dev ; Thomas Monjalon ; Ben Walker 
; Burakov, Anatoly 
Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode selection



On Mon, Jul 8, 2019 at 4:25 PM  wrote:
From: Jerin Jacob 

Existing logic fails to select IOVA mode as VA
if driver request to enable IOVA as VA.

IOVA as VA has more strict requirement than other modes,
so enabling positive logic for IOVA as VA selection.

This patch also updates the default IOVA mode as PA
for PCI devices as it has to deal with DMA engines unlike
the virtual devices that may need only IOVA as DC.

We have three cases:
- driver/hw supports IOVA as PA only

[Jerin] It is not driver cap, it is more of system cap(IOMMU vs non  IOMMU). We 
are already addressing that case


I don't get how this works. How does "system capability" affect what the 
device itself supports? Are we to assume that *all* hardware support 
IOVA as VA by default? "System capability" is more of a bus issue than 
an individual device issue, is it not?


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v8 2/2] net/mlx5: match GRE's key and present bits

2019-07-09 Thread Thomas Monjalon
09/07/2019 11:02, Xiaoyu Min:
> --- a/doc/guides/rel_notes/release_19_08.rst
> +++ b/doc/guides/rel_notes/release_19_08.rst
> @@ -191,6 +191,11 @@ New Features
>Added telemetry mode to l3fwd-power application to report
>application level busyness, empty and full polls of rte_eth_rx_burst().
>  
> +* **Updated Mellanox mlx5 driver.**
> +
> +   Updated Mellanox mlx5 driver with new features and improvements, 
> including:
> +
> +   * Added support for matching on GRE's key and C,K,S present bits.

Please move this entry with other mlx5 changes already in the release notes.




Re: [dpdk-dev] /run/dpdk cleanup

2019-07-09 Thread Burakov, Anatoly

On 08-Jul-19 4:00 PM, Shubhachint, Chaitanya wrote:

Hello,

I run my dpdk applications with "file-prefix" option, so I can run them 
concurrently. I see that it creates a runtime directory under /var/run/dpdk. This 
directory still persists after the application has terminated. The code calls 
rte_eal_cleanup before exiting but that doesn't seem to remove the directory and its 
content. In my setup I have to consistently bring my applications up, run then for short 
period and terminate them, each application run picks a new file-prefix, thus creating 
new directory under /var/run/dpdk. After enough number of runs the tempfs partition fills 
up and results into rte_eal_init failure.
Is there a cleanup routine or sequence I need to implement to unlink this 
directory? Your help is appreciated.

Chaitanya.



This is a natural consequence of how our multiprocessing works. Running 
a secondary process *after* primary has already exited is a supported 
use-case (meaning, primary won't clean up *everything* after itself - 
only things that are not shared), and secondary processes may be running 
as well (meaning, we cannot remove shared data because it may be used by 
other processes).


I don't think it's possible to do this from within DPDK. Of course, you 
could always use --in-memory mode if you don't use secondary processes.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v4 3/3] lib/lpm: use atomic store to avoid partial update

2019-07-09 Thread Ruifeng Wang (Arm Technology China)


> -Original Message-
> From: Honnappa Nagarahalli 
> Sent: Tuesday, July 9, 2019 12:43
> To: Ruifeng Wang (Arm Technology China) ;
> vladimir.medved...@intel.com; bruce.richard...@intel.com
> Cc: dev@dpdk.org; Gavin Hu (Arm Technology China) ;
> Honnappa Nagarahalli ; nd
> ; nd 
> Subject: RE: [PATCH v4 3/3] lib/lpm: use atomic store to avoid partial update
> 
> > >
> > > >
> > > > Compiler could generate non-atomic stores for whole table entry
> updating.
> > > > This may cause incorrect nexthop to be returned, if the byte with
> > > > valid flag is updated prior to the byte with next hot is updated.
> > >^^^
> > > Should be nexthop
> > >
> > > >
> > > > Changed to use atomic store to update whole table entry.
> > > >
> > > > Suggested-by: Medvedkin Vladimir 
> > > > Signed-off-by: Ruifeng Wang 
> > > > Reviewed-by: Gavin Hu 
> > > > ---
> > > > v4: initial version
> > > >
> > > >  lib/librte_lpm/rte_lpm.c | 34 --
> > > >  1 file changed, 24 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
> > > > index
> > > > baa6e7460..5d1dbd7e6 100644
> > > > --- a/lib/librte_lpm/rte_lpm.c
> > > > +++ b/lib/librte_lpm/rte_lpm.c
> > > > @@ -767,7 +767,9 @@ add_depth_small_v20(struct rte_lpm_v20 *lpm,
> > > > uint32_t ip, uint8_t depth,
> > > >  * Setting tbl8 entry in one go 
> > > > to avoid
> > > >  * race conditions
> > > >  */
> > > > -   lpm->tbl8[j] = new_tbl8_entry;
> > > > +   __atomic_store(&lpm->tbl8[j],
> > > > +   &new_tbl8_entry,
> > > > +   __ATOMIC_RELAXED);
> > > >
> > > > continue;
> > > > }
> > > > @@ -837,7 +839,9 @@ add_depth_small_v1604(struct rte_lpm *lpm,
> > > > uint32_t ip, uint8_t depth,
> > > >  * Setting tbl8 entry in one go 
> > > > to avoid
> > > >  * race conditions
> > > >  */
> > > > -   lpm->tbl8[j] = new_tbl8_entry;
> > > > +   __atomic_store(&lpm->tbl8[j],
> > > > +   &new_tbl8_entry,
> > > > +   __ATOMIC_RELAXED);
> > > >
> > > > continue;
> > > > }
> > > > @@ -965,7 +969,8 @@ add_depth_big_v20(struct rte_lpm_v20 *lpm,
> > > > uint32_t ip_masked, uint8_t depth,
> > > >  * Setting tbl8 entry in one go to 
> > > > avoid race
> > > >  * condition
> > > >  */
> > > > -   lpm->tbl8[i] = new_tbl8_entry;
> > > > +   __atomic_store(&lpm->tbl8[i],
> > > > &new_tbl8_entry,
> > > > +   __ATOMIC_RELAXED);
> > > >
> > > > continue;
> > > > }
> > > > @@ -1100,7 +1105,8 @@ add_depth_big_v1604(struct rte_lpm *lpm,
> > > > uint32_t ip_masked, uint8_t depth,
> > > >  * Setting tbl8 entry in one go to 
> > > > avoid race
> > > >  * condition
> > > >  */
> > > > -   lpm->tbl8[i] = new_tbl8_entry;
> > > > +   __atomic_store(&lpm->tbl8[i],
> > > > &new_tbl8_entry,
> > > > +   __ATOMIC_RELAXED);
> > > >
> > > > continue;
> > > > }
> > > > @@ -1393,7 +1399,9 @@ delete_depth_small_v20(struct rte_lpm_v20
> > > *lpm,
> > > > uint32_t ip_masked,
> > > >
> > > > RTE_LPM_TBL8_GROUP_NUM_ENTRIES); j++) {
> > > >
> > > > if (lpm->tbl8[j].depth <= depth)
> > > > -   lpm->tbl8[j] =
> > > > new_tbl8_entry;
> > > > +   __atomic_store(&lpm-
> > > >tbl8[j],
> > > > +   &new_tbl8_entry,
> > > > +   
> > > > __ATOMIC_RELAXED);
> > > > }
> > > > }
> > > > }
> > > > @@ -1490,7 +1498,9 @@ delete_depth_small_v1604(struct rte_lpm
> > > > *lpm, uint32_t ip_masked,
> > > >
> > > > RTE_LPM_TBL8_GROUP_NUM_ENTRIES); j++) {
> > > >
> > > >   

Re: [dpdk-dev] [PATCH v2 1/3] cryptodev: rework api of rsa algorithm

2019-07-09 Thread Kusztal, ArkadiuszX
To clarify bit more
With [AK2]

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Kusztal, ArkadiuszX
> Sent: Monday, July 8, 2019 7:44 PM
> To: Shally Verma ; dev@dpdk.org
> Cc: akhil.go...@nxp.com; Trahe, Fiona ;
> shally.ve...@caviumnetworks.com
> Subject: Re: [dpdk-dev] [PATCH v2 1/3] cryptodev: rework api of rsa
> algorithm
> 
> Hi Shally,
> 
> With [AK]
> 
> > -Original Message-
> > From: Shally Verma [mailto:shal...@marvell.com]
> > Sent: Saturday, July 6, 2019 3:14 PM
> > To: Kusztal, ArkadiuszX ; dev@dpdk.org
> > Cc: akhil.go...@nxp.com; Trahe, Fiona ;
> > shally.ve...@caviumnetworks.com
> > Subject: RE: [PATCH v2 1/3] cryptodev: rework api of rsa algorithm
> >
> >
> >
> > > -Original Message-
> > > From: Kusztal, ArkadiuszX 
> > > Sent: Thursday, July 4, 2019 6:10 PM
> > > To: dev@dpdk.org
> > > Cc: akhil.go...@nxp.com; Trahe, Fiona ;
> > > shally.ve...@caviumnetworks.com; Shally Verma
> 
> > > Subject: [EXT] RE: [PATCH v2 1/3] cryptodev: rework api of rsa
> > > algorithm
> > >
> > > External Email
> > >
> > ..
> >
> > > > diff --git a/lib/librte_cryptodev/rte_crypto_asym.h
> > > > b/lib/librte_cryptodev/rte_crypto_asym.h
> > > > index 8672f21..486399c 100644
> > > > --- a/lib/librte_cryptodev/rte_crypto_asym.h
> > > > +++ b/lib/librte_cryptodev/rte_crypto_asym.h
> > > > @@ -111,23 +111,21 @@ enum rte_crypto_asym_op_type {
> > > >   */
> > > >  enum rte_crypto_rsa_padding_type {
> > > > RTE_CRYPTO_RSA_PADDING_NONE = 0,
> > > > -   /**< RSA no padding scheme */
> > > > -   RTE_CRYPTO_RSA_PKCS1_V1_5_BT0,
> > > > -   /**< RSA PKCS#1 V1.5 Block Type 0 padding scheme
> > > > -* as described in rfc2313
> > > > +   /**< RSA no padding scheme.
> > > > +* In this case user is responsible for provision and 
> > > > verification
> > > > +* of padding.
> > > >  */
> > > > -   RTE_CRYPTO_RSA_PKCS1_V1_5_BT1,
> > > > -   /**< RSA PKCS#1 V1.5 Block Type 01 padding scheme
> > > > -* as described in rfc2313
> > > > -*/
> > > > -   RTE_CRYPTO_RSA_PKCS1_V1_5_BT2,
> > > > -   /**< RSA PKCS#1 V1.5 Block Type 02 padding scheme
> > > > -* as described in rfc2313
> > > > +   RTE_CRYPTO_RSA_PADDING_PKCS1,
> > [Shally] My preference would still be to rename as PKCS1_V1.5 to align
> > more to standard
> [AK] - Agree.
> 
> >
> > > > +   /**< RSA PKCS#1 PKCS1-v1_5 padding scheme. For signatures block
> > > > type 01,
> > > > +* for encryption block type 02 are used.
> > > >  */
> > > > RTE_CRYPTO_RSA_PADDING_OAEP,
> > > > -   /**< RSA PKCS#1 OAEP padding scheme */
> > > > +   /**< RSA PKCS#1 OAEP padding scheme, can be used only for
> > > > encryption/
> > > > +* decryption.
> > > > +*/
> > > > RTE_CRYPTO_RSA_PADDING_PSS,
> > > > -   /**< RSA PKCS#1 PSS padding scheme */
> > > > +   /**< RSA PKCS#1 PSS padding scheme, can be used only for
> > > > signatures.
> > > > +*/
> > > > RTE_CRYPTO_RSA_PADDING_TYPE_LIST_END
> > > >  };
> > > >
> > ...
> >
> > > >  struct rte_crypto_rsa_xform {
> > > > rte_crypto_param n;
> > > > -   /**< n - Prime modulus
> > > > -* Prime modulus data of RSA operation in Octet-string network
> > > > +   /**< n - Modulus
> > > > +* Modulus data of RSA operation in Octet-string network
> > > >  * byte order format.
> > > >  */
> > > >
> > > > @@ -397,9 +395,36 @@ struct rte_crypto_rsa_op_param {
> > > > /**<
> > > >  * Pointer to data
> > > >  * - to be encrypted for RSA public encrypt.
> > > > -* - to be decrypted for RSA private decrypt.
> > > >  * - to be signed for RSA sign generation.
> > > >  * - to be authenticated for RSA sign verification.
> > > > +*
> > > > +* Octet-string network byte order format.
> > > > +*
> > > > +* This field is an input to RTE_CRYPTO_ASYM_OP_ENCRYPT
> > > > +* operation, and output to RTE_CRYPTO_ASYM_OP_DECRYPT
> > > > operation.
> > > > +*
> > > > +* When RTE_CRYPTO_ASYM_OP_DECRYPT op_type used length in
> > > > bytes
> > > > +* of this field needs to be greater or equal to the length of
> > > > +* corresponding RSA key in bytes.
> > [Shally] better rephrased " When an op_type ASYM_OP_DECRYPT used,
> > length of output buffer should be greater than  or equal to RSA key
> > modulus length
> [AK] - RSA key size is a RSA modulus size, but can be changed.
> >
> > > > +*
> > > > +* When padding field is set to RTE_CRYPTO_RSA_PADDING_NONE
> > > > +* returned data size will be equal to the size of RSA key
> > > > +* in bytes. All leading zeroes will be preserved.
> > > > +*/
> > [Shally] Is it in context of OP_TYPE_DECRYPT? Even if it is
> > PADDING_NONE, whether it is encrypted/decrypted, o/p length can vary
> > between 0 .. RSA modulus lengt

Re: [dpdk-dev] [PATCH v2 1/1] fbarray: get fbarrays from containerized secondary

2019-07-09 Thread Yasufumi Ogawa

Hi Anatoly,

On 2019/07/05 17:53, Burakov, Anatoly wrote:

On 16-Apr-19 4:43 AM, ogawa.yasuf...@lab.ntt.co.jp wrote:

From: Yasufumi Ogawa 

In secondary_msl_create_walk(), it creates a file for fbarrays with its
PID for reserving unique name among secondary processes. However, it
does not work if secondary is run as app container because each of
containerized secondary has PID 1. To reserve unique name, use hostname
instead of PID if the value is 1.

Cc: sta...@dpdk.org

Signed-off-by: Yasufumi Ogawa 
---


I'm not too well versed in containers - is this hostname 1) always set, 
and 2) always unique?
For docker, 1) hostname is always set. 2) The hostname is decided as 
short form of container ID, so it might not be unique even though very 
low possibility.


I found that we can get whole container ID in `/proc/self/cgroup` as 
discussed [1]. I think using hostname is reasonable way without running 
many secondary processes. However, it might be better to use 64 digits 
full container ID instead of 12 digits short ID if ensure uniqueness 
strongly. What do yo think?


[1] 
https://forums.docker.com/t/get-a-containers-full-id-from-inside-of-itself/37237





+    if (getpid() == 1) {
+    FILE *hn_fp;
+    hn_fp = fopen("/etc/hostname", "r");
+    if (hn_fp == NULL) {
+    RTE_LOG(ERR, EAL,
+    "Cannot open '/etc/hostname' for secondary\n");
+    return -1;
+    }
+
+    /* with docker, /etc/hostname just has one entry of hostname */
+    if (fscanf(hn_fp, "%s", proc_id) == EOF)
+    return -1;


Wouldn't an error in fscanf() leak the file handle? I think you need to 
fclose() before checking the result.

I would like to fix it.

Regards,
Yasufumi



+    fclose(hn_fp);
+    } else
+    sprintf(proc_id, "%d", (int)getpid());
+
+    snprintf(name, RTE_FBARRAY_NAME_LEN, "%s_%s",
+    primary_msl->memseg_arr.name, proc_id);
  ret = rte_fbarray_init(&local_msl->memseg_arr, name,
  primary_msl->memseg_arr.len,






Re: [dpdk-dev] [PATCH] vfio: retry creating sPAPR DMA window

2019-07-09 Thread Burakov, Anatoly

On 08-Jul-19 6:45 PM, Thomas Monjalon wrote:

08/07/2019 18:47, David Christensen:

Please, any review or ack for this patch?

07/06/2019 04:28, Takeshi Yoshimura:

sPAPR allows only page_shift from VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctl.
However, Linux 4.17 or before returns incorrect page_shift for Power9.
I added the code for retrying creation of sPAPR DMA window.

Signed-off-by: Takeshi Yoshimura 


I've shared my comments with Takeshi and I don't think the patch is
ready in it's current state.  It appears to modify x86 functionality and
I'm not comfortable with signing off on that since it's in an anrea I'm
not familiar with.


It is a function specific to sPAPR.
So there is no risk for x86 or Arm.
As you didn't share your comments publicly,
this patch has been merged in 19.08-rc1.



I have a suspicion that David has confused this patch with another one, 
which did indeed touch a lot of code (but still didn't touch x86 AFAIR).


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v2 1/1] fbarray: get fbarrays from containerized secondary

2019-07-09 Thread Burakov, Anatoly

On 09-Jul-19 11:22 AM, Yasufumi Ogawa wrote:

Hi Anatoly,

On 2019/07/05 17:53, Burakov, Anatoly wrote:

On 16-Apr-19 4:43 AM, ogawa.yasuf...@lab.ntt.co.jp wrote:

From: Yasufumi Ogawa 

In secondary_msl_create_walk(), it creates a file for fbarrays with its
PID for reserving unique name among secondary processes. However, it
does not work if secondary is run as app container because each of
containerized secondary has PID 1. To reserve unique name, use hostname
instead of PID if the value is 1.

Cc: sta...@dpdk.org

Signed-off-by: Yasufumi Ogawa 
---


I'm not too well versed in containers - is this hostname 1) always 
set, and 2) always unique?
For docker, 1) hostname is always set. 2) The hostname is decided as 
short form of container ID, so it might not be unique even though very 
low possibility.


I found that we can get whole container ID in `/proc/self/cgroup` as 
discussed [1]. I think using hostname is reasonable way without running 
many secondary processes. However, it might be better to use 64 digits 
full container ID instead of 12 digits short ID if ensure uniqueness 
strongly. What do yo think?


[1] 
https://forums.docker.com/t/get-a-containers-full-id-from-inside-of-itself/37237 



I think it's better to err on the side of caution and guarantee better 
uniqueness. This code will get into an LTS and will be used for years to 
come :)


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH v2 1/1] fbarray: get fbarrays from containerized secondary

2019-07-09 Thread Burakov, Anatoly

On 09-Jul-19 11:24 AM, Burakov, Anatoly wrote:

On 09-Jul-19 11:22 AM, Yasufumi Ogawa wrote:

Hi Anatoly,

On 2019/07/05 17:53, Burakov, Anatoly wrote:

On 16-Apr-19 4:43 AM, ogawa.yasuf...@lab.ntt.co.jp wrote:

From: Yasufumi Ogawa 

In secondary_msl_create_walk(), it creates a file for fbarrays with its
PID for reserving unique name among secondary processes. However, it
does not work if secondary is run as app container because each of
containerized secondary has PID 1. To reserve unique name, use hostname
instead of PID if the value is 1.

Cc: sta...@dpdk.org

Signed-off-by: Yasufumi Ogawa 
---


I'm not too well versed in containers - is this hostname 1) always 
set, and 2) always unique?
For docker, 1) hostname is always set. 2) The hostname is decided as 
short form of container ID, so it might not be unique even though very 
low possibility.


I found that we can get whole container ID in `/proc/self/cgroup` as 
discussed [1]. I think using hostname is reasonable way without 
running many secondary processes. However, it might be better to use 
64 digits full container ID instead of 12 digits short ID if ensure 
uniqueness strongly. What do yo think?


[1] 
https://forums.docker.com/t/get-a-containers-full-id-from-inside-of-itself/37237 



I think it's better to err on the side of caution and guarantee better 
uniqueness. This code will get into an LTS and will be used for years to 
come :)




...however, i think a full 64-digit ID won't even fit into the fbarray 
filename, as i believe it's limited to something like 64 chars. Perhaps 
hostname would be enough after all... or we can increase fbarray name 
length - that would require ABI breakage but the ABI is already broken 
in this release, so it's OK i think.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH] net/softnic: fix pipeline time calculation

2019-07-09 Thread Dumitrescu, Cristian



> -Original Message-
> From: Wang, Xiao W
> Sent: Sunday, June 2, 2019 11:46 AM
> To: Singh, Jasvinder 
> Cc: dev@dpdk.org; Dumitrescu, Cristian ;
> sta...@dpdk.org
> Subject: RE: [PATCH] net/softnic: fix pipeline time calculation
> 
> 
> 
> > -Original Message-
> > From: Singh, Jasvinder
> > Sent: Friday, May 31, 2019 10:46 PM
> > To: Wang, Xiao W 
> > Cc: dev@dpdk.org; Dumitrescu, Cristian ;
> > sta...@dpdk.org
> > Subject: RE: [PATCH] net/softnic: fix pipeline time calculation
> >
> >
> >
> > > -Original Message-
> > > From: Wang, Xiao W
> > > Sent: Wednesday, May 15, 2019 2:59 PM
> > > To: Singh, Jasvinder 
> > > Cc: dev@dpdk.org; Dumitrescu, Cristian
> ;
> > > Wang, Xiao W ; sta...@dpdk.org
> > > Subject: [PATCH] net/softnic: fix pipeline time calculation
> > >
> > > When a new pipeline is added to a thread, the "time_next_min" value
> may
> > > need update, otherwise this pipeline won't get served timely.
> > >
> > > Fixes: 70709c78fda6 ("net/softnic: add command to enable/disable
> pipeline")
> > > Cc: sta...@dpdk.org
> > >
> > > Signed-off-by: Xiao Wang 
> > > ---
> > >  drivers/net/softnic/rte_eth_softnic_thread.c | 6 ++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/net/softnic/rte_eth_softnic_thread.c
> > > b/drivers/net/softnic/rte_eth_softnic_thread.c
> > > index 855408e98..2b482117d 100644
> > > --- a/drivers/net/softnic/rte_eth_softnic_thread.c
> > > +++ b/drivers/net/softnic/rte_eth_softnic_thread.c
> > > @@ -337,6 +337,9 @@ softnic_thread_pipeline_enable(struct
> > pmd_internals
> > > *softnic,
> > >   tdp->timer_period = (rte_get_tsc_hz() * p-
> >timer_period_ms)
> > /
> > > 1000;
> > >   tdp->time_next = rte_get_tsc_cycles() + tdp->timer_period;
> > >
> > > + if (tdp->time_next < td->time_next_min)
> > > + td->time_next_min = tdp->time_next;
> > > +
> > >   td->n_pipelines++;
> > >
> > >   /* Pipeline */
> > > @@ -522,6 +525,9 @@ thread_msg_handle_pipeline_enable(struct
> > > softnic_thread_data *t,
> > >   (rte_get_tsc_hz() * req->pipeline_enable.timer_period_ms)
> /
> > > 1000;
> > >   p->time_next = rte_get_tsc_cycles() + p->timer_period;
> > >
> > > + if (p->time_next < t->time_next_min)
> > > + t->time_next_min = p->time_next;
> > > +
> > >   t->n_pipelines++;
> > >
> > >   /* Response */
> > > --
> > > 2.15.1
> >
> >
> > Hi Wang,
> >
> > Timer values for pipelines and thread level message handlers are already
> > adjusted in runtime function rte_pmd_softnic_run_internal(). In runtime
> > function, the values of t->time_next_min is updated as well. IMO, above
> > changes not needed. Could you help with the case where timer
> adjustments in
> > runtime not working?
> 
> Hi Jasvinder,
> 
> the values of t->time_next_min is updated only when the pipeline message
> and thread message get handled, but not when the pipeline is added to that
> thread. E.g. when a thread t->time_next_min is ~100ms later, and a new
> pipeline is added to that thread with timer_period_ms parameter set to
> 10ms, then this pipeline's control message will not be served until 100ms
> later.
> 
> BRs,
> Xiao
> 
> >
> > Thanks,
> > Jasvinder
> >

NAK

There are no bugs/issues fixed by this patch, but there are side effects 
introduced that maybe you did not anticipate.

- Yes, the first message handler for a newly added pipeline might be slightly 
delayed, but this is harmless.

- For a given thread, we periodically iterate through all pipelines the current 
thread is running and check if there are any pending messages for each pipeline 
(function rte_pmd_softnic_run_internal). We also update the deadline for the 
next message handling session for the thread (thread->time_next_min), which 
should only be changed by the thread (existing code); if this is changed by the 
pipeline message handler, then there is the risk that some pipelines will run 
their message handler again very soon after, as the deadline had been brought 
earlier. Makes sense?



Re: [dpdk-dev] [PATCH 1/3] net/bnx2x: fix read VF id

2019-07-09 Thread Jerin Jacob Kollanukkaran
> -Original Message-
> From: dev  On Behalf Of Rasesh Mody
> Sent: Thursday, July 4, 2019 5:13 AM
> To: dev@dpdk.org
> Cc: Rasesh Mody ; ferruh.yi...@intel.com; GR-
> Everest-DPDK-Dev ; sta...@dpdk.org
> Subject: [dpdk-dev] [PATCH 1/3] net/bnx2x: fix read VF id
> 
> The logic, to read vf_id used by ACQUIRE/TEARDOWN_Q/RELEASE TLVs,
> multiplexed return value to convey vf_id value and status of read vf_id API.
> This lets to segfault at dev_start() as resources are not properly cleaned and
> re-allocated.
> 
> Fix read vf_id API to differentiate between vf_id value and return status.
> Adjust the status checking accordingly.
> Added bnx2x_vf_teardown_queue() API and moved relevant code from
> bnx2x_vf_unload() to new API.
> 
> Fixes: 540a211084a7 ("bnx2x: driver core")
> Cc: sta...@dpdk.org
> 
> Signed-off-by: Rasesh Mody 

Series applied to dpdk-next-net-mrvl/master. Thanks.


Re: [dpdk-dev] [PATCH v8 2/2] net/mlx5: match GRE's key and present bits

2019-07-09 Thread Jack Min
On Tue, 19-07-09, 11:54, Thomas Monjalon wrote:
> 09/07/2019 11:02, Xiaoyu Min:
> > --- a/doc/guides/rel_notes/release_19_08.rst
> > +++ b/doc/guides/rel_notes/release_19_08.rst
> > @@ -191,6 +191,11 @@ New Features
> >Added telemetry mode to l3fwd-power application to report
> >application level busyness, empty and full polls of rte_eth_rx_burst().
> >  
> > +* **Updated Mellanox mlx5 driver.**
> > +
> > +   Updated Mellanox mlx5 driver with new features and improvements, 
> > including:
> > +
> > +   * Added support for matching on GRE's key and C,K,S present bits.
> 
> Please move this entry with other mlx5 changes already in the release notes.
Oooops! Just did rebase automaticly...
I'll upate it.

> 
> 


[dpdk-dev] [PATCH v9 1/2] net/mlx5: support match GRE protocol on DR engine

2019-07-09 Thread Xiaoyu Min
DR engine support matching on GRE protocol field without MPLS supports.
So bypassing the MPLS check when DR is enabled.

Signed-off-by: Xiaoyu Min 
Acked-by: Viacheslav Ovsiienko 
---
 drivers/net/mlx5/mlx5_flow.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 534cd9338e..626d0f9352 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1714,6 +1714,7 @@ mlx5_flow_validate_item_gre(const struct rte_flow_item 
*item,
 sizeof(struct rte_flow_item_gre), error);
if (ret < 0)
return ret;
+#ifndef HAVE_MLX5DV_DR
 #ifndef HAVE_IBV_DEVICE_MPLS_SUPPORT
if (spec && (spec->protocol & mask->protocol))
return rte_flow_error_set(error, ENOTSUP,
@@ -1721,6 +1722,7 @@ mlx5_flow_validate_item_gre(const struct rte_flow_item 
*item,
  "without MPLS support the"
  " specification cannot be used for"
  " filtering");
+#endif
 #endif
return 0;
 }
-- 
2.21.0



[dpdk-dev] [PATCH v9 0/2] match on GRE's key

2019-07-09 Thread Xiaoyu Min
This series patchs are based on RFC [1], which enable the matching on
GRE's key field.
And enabled MLX5 device supports on this.

[1] https://patches.dpdk.org/patch/53432/

---
v2:
  * remove struct rte_flow_item_gre_key in order to comply new convention
v3:
  * updated release note
  * fixed one bug
v4:
  * resend patchs in thread mode
v5:
  * report error when K is off but gre_key item present
  * removed ITEM_CRKSV, added c_bit, k_bit, s_bit in testpmd
  * document updated
v6:
  * one fix in pmd
v7:
  * addressed comments in TestPMD
v8:
  * rebased PMD patchs on v19.08-rc1
v9:
  * fix release note
---
Xiaoyu Min (2):
  net/mlx5: support match GRE protocol on DR engine
  net/mlx5: match GRE's key and present bits

 doc/guides/rel_notes/release_19_08.rst |  1 +
 drivers/net/mlx5/mlx5_flow.c   | 63 ++-
 drivers/net/mlx5/mlx5_flow.h   |  5 ++
 drivers/net/mlx5/mlx5_flow_dv.c| 84 ++
 drivers/net/mlx5/mlx5_prm.h|  6 +-
 5 files changed, 157 insertions(+), 2 deletions(-)

-- 
2.21.0



[dpdk-dev] [PATCH v9 2/2] net/mlx5: match GRE's key and present bits

2019-07-09 Thread Xiaoyu Min
support matching on the present bits (C,K,S)
as well as the optional key field.

If the rte_flow_item_gre_key is specified in pattern,
it will set K present match automatically.

Signed-off-by: Xiaoyu Min 
Acked-by: Viacheslav Ovsiienko 
---
 doc/guides/rel_notes/release_19_08.rst |  1 +
 drivers/net/mlx5/mlx5_flow.c   | 61 ++-
 drivers/net/mlx5/mlx5_flow.h   |  5 ++
 drivers/net/mlx5/mlx5_flow_dv.c| 84 ++
 drivers/net/mlx5/mlx5_prm.h|  6 +-
 5 files changed, 155 insertions(+), 2 deletions(-)

diff --git a/doc/guides/rel_notes/release_19_08.rst 
b/doc/guides/rel_notes/release_19_08.rst
index 27d5915bed..ed58b463d0 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -112,6 +112,7 @@ New Features
   * Updated the packet header modification feature. Added support of TCP header
 sequence number and acknowledgment number modification.
   * Added support for match on ICMP/ICMP6 code and type.
+  * Added support for matching on GRE's key and C,K,S present bits.
 
 * **Updated Solarflare network PMD.**
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 626d0f9352..c7e034c815 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1666,6 +1666,61 @@ mlx5_flow_validate_item_vxlan_gpe(const struct 
rte_flow_item *item,
  " defined");
return 0;
 }
+/**
+ * Validate GRE Key item.
+ *
+ * @param[in] item
+ *   Item specification.
+ * @param[in] item_flags
+ *   Bit flags to mark detected items.
+ * @param[in] gre_item
+ *   Pointer to gre_item
+ * @param[out] error
+ *   Pointer to error structure.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_flow_validate_item_gre_key(const struct rte_flow_item *item,
+   uint64_t item_flags,
+   const struct rte_flow_item *gre_item,
+   struct rte_flow_error *error)
+{
+   const rte_be32_t *mask = item->mask;
+   int ret = 0;
+   rte_be32_t gre_key_default_mask = RTE_BE32(UINT32_MAX);
+   const struct rte_flow_item_gre *gre_spec = gre_item->spec;
+   const struct rte_flow_item_gre *gre_mask = gre_item->mask;
+
+   if (item_flags & MLX5_FLOW_LAYER_GRE_KEY)
+   return rte_flow_error_set(error, ENOTSUP,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "Multiple GRE key not support");
+   if (!(item_flags & MLX5_FLOW_LAYER_GRE))
+   return rte_flow_error_set(error, ENOTSUP,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "No preceding GRE header");
+   if (item_flags & MLX5_FLOW_LAYER_INNER)
+   return rte_flow_error_set(error, ENOTSUP,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "GRE key following a wrong item");
+   if (!gre_mask)
+   gre_mask = &rte_flow_item_gre_mask;
+   if (gre_spec && (gre_mask->c_rsvd0_ver & RTE_BE16(0x2000)) &&
+!(gre_spec->c_rsvd0_ver & RTE_BE16(0x2000)))
+   return rte_flow_error_set(error, EINVAL,
+ RTE_FLOW_ERROR_TYPE_ITEM, item,
+ "Key bit must be on");
+
+   if (!mask)
+   mask = &gre_key_default_mask;
+   ret = mlx5_flow_item_acceptable
+   (item, (const uint8_t *)mask,
+(const uint8_t *)&gre_key_default_mask,
+sizeof(rte_be32_t), error);
+   return ret;
+}
 
 /**
  * Validate GRE item.
@@ -1691,6 +1746,10 @@ mlx5_flow_validate_item_gre(const struct rte_flow_item 
*item,
const struct rte_flow_item_gre *spec __rte_unused = item->spec;
const struct rte_flow_item_gre *mask = item->mask;
int ret;
+   const struct rte_flow_item_gre nic_mask = {
+   .c_rsvd0_ver = RTE_BE16(0xB000),
+   .protocol = RTE_BE16(UINT16_MAX),
+   };
 
if (target_protocol != 0xff && target_protocol != IPPROTO_GRE)
return rte_flow_error_set(error, EINVAL,
@@ -1710,7 +1769,7 @@ mlx5_flow_validate_item_gre(const struct rte_flow_item 
*item,
mask = &rte_flow_item_gre_mask;
ret = mlx5_flow_item_acceptable
(item, (const uint8_t *)mask,
-(const uint8_t *)&rte_flow_item_gre_mask,
+(const uint8_t *)&nic_mask,
 sizeof(struct rte_flow_item_gre), error);
if (ret < 0)
return ret;
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 3d7fcf706e..6ccb8a7189 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -53,6 +53,7 @

Re: [dpdk-dev] [PATCH] net/octeontx2: add PF and VF action support

2019-07-09 Thread Jerin Jacob Kollanukkaran
> -Original Message-
> From: dev  On Behalf Of Jerin Jacob Kollanukkaran
> Sent: Monday, July 8, 2019 10:26 AM
> To: Kiran Kumar Kokkilagadda ; dev@dpdk.org
> Cc: Kiran Kumar Kokkilagadda 
> Subject: Re: [dpdk-dev] [PATCH] net/octeontx2: add PF and VF action
> support
> 
> > -Original Message-
> > From: dev  On Behalf Of
> kirankum...@marvell.com
> > Sent: Monday, July 8, 2019 9:06 AM
> > To: dev@dpdk.org
> > Cc: Kiran Kumar Kokkilagadda 
> > Subject: [dpdk-dev] [PATCH] net/octeontx2: add PF and VF action
> > support
> >
> > From: Kiran Kumar K 
> >
> > Adding PF and VF action support for octeontx2 Flow.
> > If RTE_FLOW_ACTION_TYPE_PF action is set from VF, then the packet will
> > be sent to the parent PF.
> > If RTE_FLOW_ACTION_TYPE_VF action is set and original is specified,
> > then the packet will be sent to the original VF, otherwise the packet
> > will be sent to the VF specified in the vf_id.
> >
> > Signed-off-by: Kiran Kumar K 
> 
> Acked-by: Jerin Jacob 

Applied to dpdk-next-net-mrvl/master. Thanks


Re: [dpdk-dev] [PATCH v2] devtools: better freebsd support

2019-07-09 Thread Musatescu, Flavia



On 05/07/2019 14:58, Olivier Matz wrote:

- As "readlink -e" and "readlink -m" do not exist on freebsd,
   use "readlink -f", it should not have any impact in these cases.
- "sed -ri" is invalid on freebsd and should be replaced by
   "sed -ri=''"
- Use gmake instead of make.

This fixes the following command:
   SYSDIR=/usr/src/sys ./devtools/test-build.sh \
 -j4 x86_64-native-freebsd-gcc

Signed-off-by: Olivier Matz 
---

v2:
- remove sed_ri() function and use 'sed -ri=""' as suggested by Bruce

  devtools/check-dup-includes.sh |  2 +-
  devtools/checkpatches.sh   |  8 ++--
  devtools/get-maintainer.sh |  2 +-
  devtools/load-devel-config |  4 +-
  devtools/test-build.sh | 94 ++
  devtools/validate-abi.sh   |  2 +-
  6 files changed, 58 insertions(+), 54 deletions(-)

diff --git a/devtools/check-dup-includes.sh b/devtools/check-dup-includes.sh
index e4c2748c6..591599949 100755
--- a/devtools/check-dup-includes.sh
+++ b/devtools/check-dup-includes.sh
@@ -5,7 +5,7 @@
  # Check C files in git repository for duplicated includes.
  # Usage: devtools/check-dup-includes.sh [directory]
  
-dir=${1:-$(dirname $(readlink -m $0))/..}

+dir=${1:-$(dirname $(readlink -f $0))/..}
  cd $dir
  
  # speed up by ignoring Unicode details

diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index 560e6ce93..8e2beee16 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -7,9 +7,9 @@
  # - DPDK_CHECKPATCH_CODESPELL
  # - DPDK_CHECKPATCH_LINE_LENGTH
  # - DPDK_CHECKPATCH_OPTIONS
-. $(dirname $(readlink -e $0))/load-devel-config
+. $(dirname $(readlink -f $0))/load-devel-config
  
-VALIDATE_NEW_API=$(dirname $(readlink -e $0))/check-symbol-change.sh

+VALIDATE_NEW_API=$(dirname $(readlink -f $0))/check-symbol-change.sh
  
  # Enable codespell by default. This can be overwritten from a config file.

  # Codespell can also be enabled by setting DPDK_CHECKPATCH_CODESPELL to a 
valid path
@@ -66,7 +66,7 @@ check_forbidden_additions() { # 
-v EXPRESSIONS="rte_panic\\\( rte_exit\\\(" \
-v RET_ON_FAIL=1 \
-v MESSAGE='Using rte_panic/rte_exit' \
-   -f $(dirname $(readlink -e $0))/check-forbidden-tokens.awk \
+   -f $(dirname $(readlink -f $0))/check-forbidden-tokens.awk \
"$1" || res=1
  
  	# svg figures must be included with wildcard extension

@@ -75,7 +75,7 @@ check_forbidden_additions() { # 
-v EXPRESSIONS='::[[:space:]]*[^[:space:]]*\\.svg' \
-v RET_ON_FAIL=1 \
-v MESSAGE='Using explicit .svg extension instead of .*' \
-   -f $(dirname $(readlink -e $0))/check-forbidden-tokens.awk \
+   -f $(dirname $(readlink -f $0))/check-forbidden-tokens.awk \
"$1" || res=1
  
  	return $res

diff --git a/devtools/get-maintainer.sh b/devtools/get-maintainer.sh
index b9160486a..85740f5af 100755
--- a/devtools/get-maintainer.sh
+++ b/devtools/get-maintainer.sh
@@ -5,7 +5,7 @@
  
  # Load config options:

  # - DPDK_GETMAINTAINER_PATH
-. $(dirname $(readlink -e $0))/load-devel-config
+. $(dirname $(readlink -f $0))/load-devel-config
  
  options="--no-git-fallback"

  options="$options --no-rolestats"
diff --git a/devtools/load-devel-config b/devtools/load-devel-config
index 4f43cb352..380c79db4 100644
--- a/devtools/load-devel-config
+++ b/devtools/load-devel-config
@@ -6,7 +6,7 @@ test ! -r /etc/dpdk/devel.config ||
  test ! -r ~/.config/dpdk/devel.config ||
  . ~/.config/dpdk/devel.config
  # from local file
-test ! -r $(dirname $(readlink -m $0))/../.develconfig ||
-. $(dirname $(readlink -m $0))/../.develconfig
+test ! -r $(dirname $(readlink -f $0))/../.develconfig ||
+. $(dirname $(readlink -f $0))/../.develconfig
  
  # The config files must export variables in the shell style

diff --git a/devtools/test-build.sh b/devtools/test-build.sh
index 9b50bf73d..22d409bb0 100755
--- a/devtools/test-build.sh
+++ b/devtools/test-build.sh
@@ -28,7 +28,7 @@ default_path=$PATH
  # - LIBSSO_SNOW3G_PATH
  # - LIBSSO_KASUMI_PATH
  # - LIBSSO_ZUC_PATH
-. $(dirname $(readlink -e $0))/load-devel-config
+. $(dirname $(readlink -f $0))/load-devel-config
  
  print_usage () {

echo "usage: $(basename $0) [-h] [-jX] [-s] [config1 [config2] ...]]"
@@ -57,6 +57,10 @@ print_help () {
END_OF_HELP
  }
  
+[ -z $MAKE ] && command -v gmake > /dev/null && MAKE=gmake

+[ -z $MAKE ] && command -v make > /dev/null && MAKE=make
+[ -z $MAKE ] && echo "Cannot find make or gmake" && exit 1
+
  J=$DPDK_MAKE_JOBS
  short=false
  unset verbose
@@ -90,7 +94,7 @@ trap "signal=INT ; trap - INT ; kill -INT $$" INT
  # notify result on exit
  trap on_exit EXIT
  
-cd $(dirname $(readlink -m $0))/..

+cd $(dirname $(readlink -f $0))/..
  
  reset_env ()

  {
@@ -127,83 +131,83 @@ config () #   
fi
if [ ! -e $1/.config ] || $reconfig ; then
   

[dpdk-dev] [PATCH] librte_flow_classify: fix out-of-bounds access

2019-07-09 Thread Bernard Iremonger
This patch fixes the out-of-bounds coverity issue by removing the
offending line of code at line 107 in rte_flow_classify_parse.c
which is never executed.

Coverity issue: 343454

Fixes: be41ac2a330f ("flow_classify: introduce flow classify library")
Cc: sta...@dpdk.org
Signed-off-by: Bernard Iremonger 
---
 lib/librte_flow_classify/rte_flow_classify_parse.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/lib/librte_flow_classify/rte_flow_classify_parse.c 
b/lib/librte_flow_classify/rte_flow_classify_parse.c
index f65ceaf..4653302 100644
--- a/lib/librte_flow_classify/rte_flow_classify_parse.c
+++ b/lib/librte_flow_classify/rte_flow_classify_parse.c
@@ -103,8 +103,6 @@ classify_pattern_skip_void_item(struct rte_flow_item *items,
pb = pe;
break;
}
-
-   pb = pe + 1;
}
/* Copy the END item. */
rte_memcpy(items, pe, sizeof(struct rte_flow_item));
-- 
2.7.4



Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Jerin Jacob Kollanukkaran
> -Original Message-
> From: Burakov, Anatoly 
> Sent: Tuesday, July 9, 2019 3:15 PM
> To: Jerin Jacob Kollanukkaran ; David Marchand
> 
> Cc: dev ; Thomas Monjalon ; Ben
> Walker 
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> selection
> 
> On 08-Jul-19 8:13 PM, Jerin Jacob Kollanukkaran wrote:
> > See below,
> >
> > Please send the email as text to avoid formatting issue.(No HTML)
> >
> > From: David Marchand 
> > Sent: Tuesday, July 9, 2019 12:09 AM
> > To: Jerin Jacob Kollanukkaran 
> > Cc: dev ; Thomas Monjalon ;
> Ben
> > Walker ; Burakov, Anatoly
> > 
> > Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> > selection
> >
> > 
> >
> > On Mon, Jul 8, 2019 at 4:25 PM  wrote:
> > From: Jerin Jacob 
> >
> > Existing logic fails to select IOVA mode as VA if driver request to
> > enable IOVA as VA.
> >
> > IOVA as VA has more strict requirement than other modes, so enabling
> > positive logic for IOVA as VA selection.
> >
> > This patch also updates the default IOVA mode as PA for PCI devices as
> > it has to deal with DMA engines unlike the virtual devices that may
> > need only IOVA as DC.
> >
> > We have three cases:
> > - driver/hw supports IOVA as PA only
> >
> > [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
> > IOMMU). We are already addressing that case
> 
> I don't get how this works. How does "system capability" affect what the
> device itself supports? Are we to assume that *all* hardware support IOVA
> as VA by default? "System capability" is more of a bus issue than an 
> individual
> device issue, is it not?

What I meant is, supporting VA vs PA is function of IOMMU(not the device 
attribute).
Ie. Device makes the  bus master request, if IOMMU available and enabled in the 
SYSTEM ,
It goes over IOMMU  and translate the IOVA to physical address.

Another way to put is, Is there any _PCIe_ device which need/requires
RTE_PCI_DRV_NEED_IOVA_AS_PA in rte_pci_driver.drv_flags





> 
> --
> Thanks,
> Anatoly


Re: [dpdk-dev] [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare exchange

2019-07-09 Thread Jerin Jacob Kollanukkaran
> -Original Message-
> From: Phil Yang (Arm Technology China) 
> Sent: Tuesday, July 9, 2019 2:58 PM
> To: Pavan Nikhilesh Bhagavatula ; Honnappa
> Nagarahalli ; Jerin Jacob Kollanukkaran
> ; dev@dpdk.org
> Cc: tho...@monjalon.net; hemant.agra...@nxp.com; Gavin Hu (Arm
> Technology China) ; nd ;
> gage.e...@intel.com; nd ; nd 
> Subject: RE: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare
> exchange
> 
> > -Original Message-
> > From: Pavan Nikhilesh Bhagavatula 
> > Sent: Friday, July 5, 2019 12:37 PM
> > To: Honnappa Nagarahalli ;
> > jer...@marvell.com; Phil Yang (Arm Technology China)
> > ; dev@dpdk.org
> > Cc: tho...@monjalon.net; hemant.agra...@nxp.com; Gavin Hu (Arm
> > Technology China) ; nd ;
> > gage.e...@intel.com; nd 
> > Subject: RE: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic
> > compare exchange
> >
> >
> 
> 
> 
> > >> > > +#ifdef __ARM_FEATURE_ATOMICS
> > >> > > +#define __ATOMIC128_CAS_OP(cas_op_name, op_string)
> > >> \
> > >> > > +static inline rte_int128_t  
> > >> > > \
> > >> > > +cas_op_name(rte_int128_t *dst, rte_int128_t old,
> > >> > > \
> > >> > > +rte_int128_t updated)   
> > >> > > \
> > >> > > +{   
> > >> > > \
> > >> > > +/* caspX instructions register pair must start from even-
> > >numbered
> > >> > > + * register at operand 1.
> > >> > > + * So, specify registers for local variables here.
> > >> > > + */ 
> > >> > > \
> > >> > > +register uint64_t x0 __asm("x0") = (uint64_t)old.val[0];
> > \
> > >> >
> > >> > I understand CASP limitation on register has to be even and odd.
> > >> > Is there anyway to remove explicit x0 register allocation and
> > >> > choose compiler to decide the register. Some reason with
> > >> > optimize(03) gcc makes correctly but not clang.
> > >> >
> > >> > Hardcoding to specific register makes compiler to not optimize
> > >> > the stuff, especially if it is inline function.
> > >>
> > >> It look like the limitation fixed recently in gcc.
> > >> https://patches.linaro.org/patch/147991/
> > >>
> > >> Not sure about old gcc and clang. ARM compiler experts may know
> > >the exact
> > >> status
> > >>
> > >We could use syntax as follows, an example is in [1] static inline
> > >rte_int128_t __rte_casp(rte_int128_t *dst, rte_int128_t old,
> > >rte_int128_t updated, int mo) {
> > >   __asm__ volatile("caspl %0, %H0, %1, %H1, [%2]"
> > >: "+r" (old)
> > >: "r" (updated), "r" (dst)
> > >: "memory");
> > >   return old;
> > >}
> >
> > We have used this format for mempool/octeontx2 but clang wasn't too
> > happy.
> >
> > dpdk/drivers/mempool/octeontx2/otx2_mempool_ops.c:151:15: error:
> > value size does not match register size specified by the constraint
> > and modifier [-Werror,-Wasm-operand-widths]
> > [t0] "=&r" (t0), [t1] "=&r" (t1), [t2] "=&r" (t2),
> > ^
> > dpdk/drivers/mempool/octeontx2/otx2_mempool_ops.c:82:9: note: use
> > constraint modifier "w"
> > "casp %[t0], %H[t0], %[wdata], %H[wdata], [%[loc]]\n"
> >
> > Had to change it to hand coded asm
> >
> > http://patches.dpdk.org/patch/56110/
> 
> Hi Jerin,
> 
> The update from the compiler team is 'the LSE CASP fix has not been
> backported to older GCC branches'.
> So, currently, this seems the only approach works for all versions of GCC and
> Clang.
> I think we can add another optimization patch for this once all the compilers
> were ready.

We are on same page.


> 
> Thanks,
> Phil
> >
> > >
> > >[1] https://godbolt.org/z/EUJnuG


Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Burakov, Anatoly

On 09-Jul-19 12:13 PM, Jerin Jacob Kollanukkaran wrote:

-Original Message-
From: Burakov, Anatoly 
Sent: Tuesday, July 9, 2019 3:15 PM
To: Jerin Jacob Kollanukkaran ; David Marchand

Cc: dev ; Thomas Monjalon ; Ben
Walker 
Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
selection

On 08-Jul-19 8:13 PM, Jerin Jacob Kollanukkaran wrote:

See below,

Please send the email as text to avoid formatting issue.(No HTML)

From: David Marchand 
Sent: Tuesday, July 9, 2019 12:09 AM
To: Jerin Jacob Kollanukkaran 
Cc: dev ; Thomas Monjalon ;

Ben

Walker ; Burakov, Anatoly

Subject: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
selection



On Mon, Jul 8, 2019 at 4:25 PM  wrote:
From: Jerin Jacob 

Existing logic fails to select IOVA mode as VA if driver request to
enable IOVA as VA.

IOVA as VA has more strict requirement than other modes, so enabling
positive logic for IOVA as VA selection.

This patch also updates the default IOVA mode as PA for PCI devices as
it has to deal with DMA engines unlike the virtual devices that may
need only IOVA as DC.

We have three cases:
- driver/hw supports IOVA as PA only

[Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
IOMMU). We are already addressing that case


I don't get how this works. How does "system capability" affect what the
device itself supports? Are we to assume that *all* hardware support IOVA
as VA by default? "System capability" is more of a bus issue than an individual
device issue, is it not?


What I meant is, supporting VA vs PA is function of IOMMU(not the device 
attribute).
Ie. Device makes the  bus master request, if IOMMU available and enabled in the 
SYSTEM ,
It goes over IOMMU  and translate the IOVA to physical address.

Another way to put is, Is there any _PCIe_ device which need/requires
RTE_PCI_DRV_NEED_IOVA_AS_PA in rte_pci_driver.drv_flags




Previously, as far as i can tell, the flag was used to indicate support 
for IOVA as VA mode, not *requirement* for IOVA as VA mode. For example, 
there are multiple patches [1][2][3][4] (i'm sure i can find more!) that 
added IOVA as VA support to various drivers, and they all were worded it 
in this exact way - "support for IOVA as VA mode", not "require IOVA as 
VA mode". As far as i can tell, none of these drivers *require* IOVA as 
VA mode - they merely use this flag to indicate support for it.


Specifically, from my perspective, the "support for IOVA as VA mode" has 
in practice always indicated support for VFIO (or similar drivers) as 
far as the PCI bus is concerned. As in, the device *could* use IOVA as 
VA mode, but since it may be bound to igb_uio (which doesn't support 
IOVA as VA), the IOVA as VA mode may not be supported for a particular 
device. So, a particular device *cannot support* IOVA as VA if it's 
bound to igb_uio or uio_pci_generic (or VFIO in noiommu mode). This is 
not *just* a capability thing, but also kernel driver issue.


Now suddenly it turns out that someone somewhere "knew" that "IOVA as 
VA" flag in PCI drivers is supposed to indicate *requirement* and not 
support, and it appears that this knowledge was not communicated nor 
documented anywhere, and is now treated as common knowledge.


[1] http://patchwork.dpdk.org/patch/53206/
[2] http://patchwork.dpdk.org/patch/50274/
[3] http://patchwork.dpdk.org/patch/50991/
[4] http://patchwork.dpdk.org/patch/46134/

--
Thanks,
Anatoly


Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Jerin Jacob Kollanukkaran
> -Original Message-
> From: Burakov, Anatoly 
> Sent: Tuesday, July 9, 2019 5:10 PM
> To: Jerin Jacob Kollanukkaran ; David Marchand
> 
> Cc: dev ; Thomas Monjalon ; Ben
> Walker 
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> selection
> >>> 
> >>>
> >>> On Mon, Jul 8, 2019 at 4:25 PM  wrote:
> >>> From: Jerin Jacob 
> >>>
> >>> Existing logic fails to select IOVA mode as VA if driver request to
> >>> enable IOVA as VA.
> >>>
> >>> IOVA as VA has more strict requirement than other modes, so enabling
> >>> positive logic for IOVA as VA selection.
> >>>
> >>> This patch also updates the default IOVA mode as PA for PCI devices
> >>> as it has to deal with DMA engines unlike the virtual devices that
> >>> may need only IOVA as DC.
> >>>
> >>> We have three cases:
> >>> - driver/hw supports IOVA as PA only
> >>>
> >>> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
> >>> IOMMU). We are already addressing that case
> >>
> >> I don't get how this works. How does "system capability" affect what
> >> the device itself supports? Are we to assume that *all* hardware
> >> support IOVA as VA by default? "System capability" is more of a bus
> >> issue than an individual device issue, is it not?
> >
> > What I meant is, supporting VA vs PA is function of IOMMU(not the device
> attribute).
> > Ie. Device makes the  bus master request, if IOMMU available and
> > enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA to
> physical address.
> >
> > Another way to put is, Is there any _PCIe_ device which need/requires
> > RTE_PCI_DRV_NEED_IOVA_AS_PA in rte_pci_driver.drv_flags
> >
> >
> 
> Previously, as far as i can tell, the flag was used to indicate support for 
> IOVA
> as VA mode, not *requirement* for IOVA as VA mode. For example, there
> are multiple patches [1][2][3][4] (i'm sure i can find more!) that added IOVA
> as VA support to various drivers, and they all were worded it in this exact 
> way
> - "support for IOVA as VA mode", not "require IOVA as VA mode". As far as i
> can tell, none of these drivers *require* IOVA as VA mode - they merely use
> this flag to indicate support for it.

Some class of devices NEED IOVA as VA for performance reasons.
Specially the devices has HW mempool allocators. On those devices If we don’t 
use IOVA as VA,
Upon getting packet from device, It needs to go over rte_mem_iova2virt() per 
packet see driver/net/dppa2. Which has real performance issue.

> Specifically, from my perspective, the "support for IOVA as VA mode" has in
> practice always indicated support for VFIO (or similar drivers) as far as the 
> PCI
> bus is concerned. As in, the device *could* use IOVA as VA mode, but since it
> may be bound to igb_uio (which doesn't support IOVA as VA), the IOVA as
> VA mode may not be supported for a particular device. So, a particular device
> *cannot support* IOVA as VA if it's bound to igb_uio or uio_pci_generic (or
> VFIO in noiommu mode). This is not *just* a capability thing, but also kernel
> driver issue.

Yes. See below.

> 
> Now suddenly it turns out that someone somewhere "knew" that "IOVA as
> VA" flag in PCI drivers is supposed to indicate *requirement* and not
> support, and it appears that this knowledge was not communicated nor
> documented anywhere, and is now treated as common knowledge.

I think, the confusion here is,  I was under impression that
# If device supports IOVA as VA and system runs with IOMMU then
the  dpdk should run in IOVA as VA mode.
If above statement true then we don’t really need a new flag.

Couple of points to make forward progress:
# If we think, there is a use case where device is IOVA as VA 
And system runs in IOMMU mode then for some reason DPDK needs
to run in PA mode. If so, we need to create two flags
RTE_PCI_DRV_IOVA_AS_VA - it can run either modes
RTE_PCI_DRV_NEED_IOVA_AS_VA - it can run only on IOVA as VA
# With top of tree, Currently it never runs in IOVA as VA mode.
That’s a separate problem to fix. Which effect all the devices
Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though
Device support RTE_PCI_DRV_IOVA_AS_VA, it is not running
With IOMMU protection and/or root privilege is required to run DPDK.


> 
> [1] http://patchwork.dpdk.org/patch/53206/
> [2] http://patchwork.dpdk.org/patch/50274/
> [3] http://patchwork.dpdk.org/patch/50991/
> [4] http://patchwork.dpdk.org/patch/46134/
> 
> --
> Thanks,
> Anatoly


Re: [dpdk-dev] [PATCH v1 1/1] net/hinic: use mutex replace spin lock

2019-07-09 Thread Xuanziyang (William, Chip Application Design Logic and Hardware Development Dept IT_Products & Solutions)
> On Wed, 3 Jul 2019 23:35:42 +0800
> Ziyang Xuan  wrote:
> 
> >
> > +static inline int hinic_mutex_init(pthread_mutex_t *pthreadmutex,
> > +   const pthread_mutexattr_t *mattr) {
> > +   int err;
> > +
> > +   err = pthread_mutex_init(pthreadmutex, mattr);
> > +   if (unlikely(err))
> > +   PMD_DRV_LOG(ERR, "Fail to initialize mutex, error: %d", err);
> > +
> > +   return err;
> > +}
> > +
> > +static inline int hinic_mutex_destroy(pthread_mutex_t *pthreadmutex)
> > +{
> > +   int err;
> > +
> > +   err = pthread_mutex_destroy(pthreadmutex);
> > +   if (unlikely(err))
> > +   PMD_DRV_LOG(ERR, "Fail to destroy mutex, error: %d", err);
> > +
> > +   return err;
> > +}
> > +
> 
> I don't think the wrapper functions add much.
> pthread_mutex_init just sets internals of data structure and won't fail ever 
> if
> mutexattr_t is NULL.
> 
> Just use pthread_mutex_init/pthread_mutex_destroy directly and ignore
> errors.

I think pthread_mutex_init/pthread_mutex_destroy may fail under some situations.
https://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_init.html

I want some logs when they failed to position problems conveniently and I would 
also
detect their failures. So I encapsulate the functions. And I think it is better.


Re: [dpdk-dev] [PATCH v2 1/1] net/hinic: use mutex replace spin lock

2019-07-09 Thread Xuanziyang (William, Chip Application Design Logic and Hardware Development Dept IT_Products & Solutions)
> On 7/5/2019 7:47 AM, Ziyang Xuan wrote:
> > Using spin lock to protect critical resources of sending mgmt
> > messages. This will make high CPU usage for rte_delay_ms when sending
> > mgmt messages frequently. We can use mutex to protect the critical
> > resources and usleep to reduce CPU usage while keep functioning
> > properly.
> >
> > Signed-off-by: Ziyang Xuan 
> 
> <...>
> 
> > +static inline int hinic_mutex_init(pthread_mutex_t *pthreadmutex,
> > +   const pthread_mutexattr_t *mattr) {
> > +   int err;
> > +
> > +   err = pthread_mutex_init(pthreadmutex, mattr);
> > +   if (unlikely(err))
> > +   PMD_DRV_LOG(ERR, "Fail to initialize mutex, error: %d", err);
> > +
> > +   return err;
> > +}
> > +
> > +static inline int hinic_mutex_destroy(pthread_mutex_t *pthreadmutex)
> > +{
> > +   int err;
> > +
> > +   err = pthread_mutex_destroy(pthreadmutex);
> > +   if (unlikely(err))
> > +   PMD_DRV_LOG(ERR, "Fail to destroy mutex, error: %d", err);
> > +
> > +   return err;
> > +}
> > +
> 
> There was a comment from Stephen to use pthread APIs directly, can you
> please comment on that?

I have reply him already.

> 
> 
> > @@ -713,7 +718,7 @@ int hinic_aeq_poll_msg(struct hinic_eq *eq, u32
> timeout, void *param)
> > }
> >
> > if (timeout != 0)
> > -   rte_delay_ms(1);
> > +   usleep(1000);
> 
> Why is this change required? Aren't these are same?

The function rte_delay_ms is blocked and usleep is dispatched.
We get high CPU usage when we use rte_delay_ms but usleep.
It is the purpose of this patch.

Thanks!


[dpdk-dev] [PATCH] doc: fix PDF build

2019-07-09 Thread Thomas Monjalon
The command "make doc-guides-pdf" is failing because
there are more than 1500 lines in the file MAINTAINERS
which is included in the contributing guide.

We are facing the issue mentioned in this comment:
https://github.com/sphinx-doc/sphinx/issues/3099#issuecomment-256440704

Anyway the file MAINTAINERS is mentioned several times in the guide.
So the "literalinclude" is removed from the guide to fix the build
of the PDF.

Signed-off-by: Thomas Monjalon 
---
 doc/guides/contributing/patches.rst | 9 -
 1 file changed, 9 deletions(-)

diff --git a/doc/guides/contributing/patches.rst 
b/doc/guides/contributing/patches.rst
index 3b2b174ad..f37fb5557 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -648,12 +648,3 @@ patch accepted. The general cycle for patch review and 
acceptance is:
  than rework of the original.
* Trivial patches may be merged sooner than described above at the tree 
committer's
  discretion.
-
-DPDK Maintainers
-
-
-The following are the DPDK maintainers as listed in the ``MAINTAINERS`` file
-in the DPDK root directory.
-
-.. literalinclude:: ../../../MAINTAINERS
-   :lines: 3-
-- 
2.21.0



Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Burakov, Anatoly

On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:

-Original Message-
From: Burakov, Anatoly 
Sent: Tuesday, July 9, 2019 5:10 PM
To: Jerin Jacob Kollanukkaran ; David Marchand

Cc: dev ; Thomas Monjalon ; Ben
Walker 
Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
selection



On Mon, Jul 8, 2019 at 4:25 PM  wrote:
From: Jerin Jacob 

Existing logic fails to select IOVA mode as VA if driver request to
enable IOVA as VA.

IOVA as VA has more strict requirement than other modes, so enabling
positive logic for IOVA as VA selection.

This patch also updates the default IOVA mode as PA for PCI devices
as it has to deal with DMA engines unlike the virtual devices that
may need only IOVA as DC.

We have three cases:
- driver/hw supports IOVA as PA only

[Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
IOMMU). We are already addressing that case


I don't get how this works. How does "system capability" affect what
the device itself supports? Are we to assume that *all* hardware
support IOVA as VA by default? "System capability" is more of a bus
issue than an individual device issue, is it not?


What I meant is, supporting VA vs PA is function of IOMMU(not the device

attribute).

Ie. Device makes the  bus master request, if IOMMU available and
enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA to

physical address.


Another way to put is, Is there any _PCIe_ device which need/requires
RTE_PCI_DRV_NEED_IOVA_AS_PA in rte_pci_driver.drv_flags




Previously, as far as i can tell, the flag was used to indicate support for IOVA
as VA mode, not *requirement* for IOVA as VA mode. For example, there
are multiple patches [1][2][3][4] (i'm sure i can find more!) that added IOVA
as VA support to various drivers, and they all were worded it in this exact way
- "support for IOVA as VA mode", not "require IOVA as VA mode". As far as i
can tell, none of these drivers *require* IOVA as VA mode - they merely use
this flag to indicate support for it.


Some class of devices NEED IOVA as VA for performance reasons.
Specially the devices has HW mempool allocators. On those devices If we don’t 
use IOVA as VA,
Upon getting packet from device, It needs to go over rte_mem_iova2virt() per
packet see driver/net/dppa2. Which has real performance issue.


I wouldn't classify this as "needing" IOVA. "Need" implies it cannot 
work without it, whereas in this case it's more of a "highly 
recommended" rather than "need".




Now suddenly it turns out that someone somewhere "knew" that "IOVA as
VA" flag in PCI drivers is supposed to indicate *requirement* and not
support, and it appears that this knowledge was not communicated nor
documented anywhere, and is now treated as common knowledge.


I think, the confusion here is,  I was under impression that
# If device supports IOVA as VA and system runs with IOMMU then
the  dpdk should run in IOVA as VA mode.
If above statement true then we don’t really need a new flag.


Exactly. And the flag used to indicate that the device *supports* IOVA 
as VA, not that it *requires* it.




Couple of points to make forward progress:
# If we think, there is a use case where device is IOVA as VA
And system runs in IOMMU mode then for some reason DPDK needs
to run in PA mode. If so, we need to create two flags
RTE_PCI_DRV_IOVA_AS_VA - it can run either modes


There are use cases - KNI and igb_uio come to mind. Whether IOMMU uses 
VA or PA is a different from whether IOMMU is in use - there is no law 
that states that, when using IOMMU, IOVA have to have 1:1 mapping with 
VA. IOMMU requirement does not necessarily imply IOVA as VA - it is 
perfectly legal to program IOMMU to use IOVA as PA (which we currently 
do when we e.g. use VFIO for some devices and igb_uio for others).



RTE_PCI_DRV_NEED_IOVA_AS_VA - it can run only on IOVA as VA


If we're adding a flag, we might as well not create a confusion and do 
it consistently. If IOVA as PA is supported, have a flag to indicate 
that. If IOVA as VA is supported, have a flag to indicate that. Absence 
of either flag implies inability to work in that mode. I don't see how 
this is less clear and self-documenting than having two IOVA as 
VA-related flags that have slightly different meaning and imply things 
not otherwise stated explicitly.



# With top of tree, Currently it never runs in IOVA as VA mode.
That’s a separate problem to fix. Which effect all the devices
Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though
Device support RTE_PCI_DRV_IOVA_AS_VA, it is not running
With IOMMU protection and/or root privilege is required to run DPDK.




[1] http://patchwork.dpdk.org/patch/53206/
[2] http://patchwork.dpdk.org/patch/50274/
[3] http://patchwork.dpdk.org/patch/50991/
[4] http://patchwork.dpdk.org/patch/46134/

--
Thanks,
Anatoly



--
Thanks,
Anatoly


[dpdk-dev] [PATCH] doc: update release notes for OCTEON TX2 support

2019-07-09 Thread jerinj
From: Jerin Jacob 

Update release notes for various OCTEON TX2 drivers supported for 19.08.

Signed-off-by: Jerin Jacob 
---
 doc/guides/rel_notes/release_19_08.rst | 16 
 1 file changed, 16 insertions(+)

diff --git a/doc/guides/rel_notes/release_19_08.rst 
b/doc/guides/rel_notes/release_19_08.rst
index 27d5915be..a423f2e4d 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -191,6 +191,22 @@ New Features
   Added telemetry mode to l3fwd-power application to report
   application level busyness, empty and full polls of rte_eth_rx_burst().
 
+* **Added Marvell OCTEON TX2 drivers.**
+
+  Added the new ``ethdev``, ``eventdev``, ``mempool``, ``eventdev Rx adapter``,
+  ``eventdev Tx adapter``, ``eventdev Timer adapter`` and ``rawdev DMA``
+  drivers for various HW coprocessors available in ``OCTEON TX2`` SoC.
+
+  See :doc:`../platform/octeontx2` for OCTEON TX2 platform information.
+
+  See :doc:`../nics/octeontx2` for NIX ethdev driver information.
+
+  See :doc:`../mempool/octeontx2` for NPA mempool driver information.
+
+  See :doc:`../eventdevs/octeontx2` for SSO event device driver information.
+
+  See :doc:`../rawdevs/octeontx2_dma` for DMA driver information.
+
 
 Removed Items
 -
-- 
2.22.0



Re: [dpdk-dev] [EXT] [RFC PATCH] vfio: move eventfd/interrupt pairing at setup time

2019-07-09 Thread Shahed Shaikh
> -Original Message-
> From: dev  On Behalf Of David Marchand
> Sent: Tuesday, July 2, 2019 6:18 PM
> To: dev@dpdk.org
> Cc: anatoly.bura...@intel.com; alex.william...@redhat.com;
> maxime.coque...@redhat.com; tho...@monjalon.net;
> step...@networkplumber.org
> Subject: [EXT] [dpdk-dev] [RFC PATCH] vfio: move eventfd/interrupt pairing at
> setup time
> 
> 
> --
> Populating the eventfd in rte_intr_enable in each request to vfio triggers a
> reconfiguration of the interrupt handler on the kernel side.
> The problem is that rte_intr_enable is often used to re-enable masked 
> interrupts
> from drivers interrupt handlers.
> 
> This reconfiguration leaves a window during which a device could send an
> interrupt and then the kernel logs this unsolicited interrupt:
> [158764.159833] do_IRQ: 9.34 No irq handler for vector
> 
> VFIO api makes it possible to set the fd at setup time.
> Make use of this and then we only need to ask for masking/unmasking legacy
> interrupts and we have nothing to do for MSI/MSIX.
> 
> "rxtx" interrupts are left untouched but are most likely subject to the same 
> issue.
> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1654824
> Signed-off-by: David Marchand 

Thanks David for this patch.
I have tested this patch with FastLinq adapters on which original issue is 
reported.
Tested this with MSI-x and legacy interrupt mode.

Tested-by: Shahed Shaikh 

> ---
> Sending as a RFC patch since it is not trivial and can eat your babies.
> Could people that know well this parts have a look at this?
> Thanks!
> 
> ---
>  drivers/bus/pci/linux/pci_vfio.c  |  78 ++--
>  lib/librte_eal/linux/eal/eal_interrupts.c | 197 
> +++---
>  2 files changed, 86 insertions(+), 189 deletions(-)
> 
> diff --git a/drivers/bus/pci/linux/pci_vfio.c 
> b/drivers/bus/pci/linux/pci_vfio.c
> index ebf6ccd..e4c32c4 100644
> --- a/drivers/bus/pci/linux/pci_vfio.c
> +++ b/drivers/bus/pci/linux/pci_vfio.c
> @@ -186,8 +186,11 @@
>  static int
>  pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd)  {
> - int i, ret, intr_idx;
> + char irq_set_buf[sizeof(struct vfio_irq_set) + sizeof(int)];
> + struct vfio_irq_set *irq_set;
>   enum rte_intr_mode intr_mode;
> + int i, ret, intr_idx;
> + int fd;
> 
>   /* default to invalid index */
>   intr_idx = VFIO_PCI_NUM_IRQS;
> @@ -219,7 +222,6 @@
>   /* start from MSI-X interrupt type */
>   for (i = VFIO_PCI_MSIX_IRQ_INDEX; i >= 0; i--) {
>   struct vfio_irq_info irq = { .argsz = sizeof(irq) };
> - int fd = -1;
> 
>   /* skip interrupt modes we don't want */
>   if (intr_mode != RTE_INTR_MODE_NONE && @@ -235,51
> +237,51 @@
>   return -1;
>   }
> 
> + /* found a usable interrupt mode */
> + if ((irq.flags & VFIO_IRQ_INFO_EVENTFD) != 0)
> + break;
> +
>   /* if this vector cannot be used with eventfd, fail if we 
> explicitly
>* specified interrupt type, otherwise continue */
> - if ((irq.flags & VFIO_IRQ_INFO_EVENTFD) == 0) {
> - if (intr_mode != RTE_INTR_MODE_NONE) {
> - RTE_LOG(ERR, EAL,
> - "  interrupt vector does not
> support eventfd!\n");
> - return -1;
> - } else
> - continue;
> - }
> -
> - /* set up an eventfd for interrupts */
> - fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
> - if (fd < 0) {
> - RTE_LOG(ERR, EAL, "  cannot set up eventfd, "
> - "error %i (%s)\n", errno,
> strerror(errno));
> + if (intr_mode != RTE_INTR_MODE_NONE) {
> + RTE_LOG(ERR, EAL, "  interrupt vector does not support
> eventfd!\n");
>   return -1;
>   }
> + }
> 
> - dev->intr_handle.fd = fd;
> - dev->intr_handle.vfio_dev_fd = vfio_dev_fd;
> + if (i < 0)
> + return -1;
> 
> - switch (i) {
> - case VFIO_PCI_MSIX_IRQ_INDEX:
> - intr_mode = RTE_INTR_MODE_MSIX;
> - dev->intr_handle.type =
> RTE_INTR_HANDLE_VFIO_MSIX;
> - break;
> - case VFIO_PCI_MSI_IRQ_INDEX:
> - intr_mode = RTE_INTR_MODE_MSI;
> - dev->intr_handle.type = RTE_INTR_HANDLE_VFIO_MSI;
> - break;
> - case VFIO_PCI_INTX_IRQ_INDEX:
> - intr_mode = RTE_INTR_MODE_LEGACY;
> - dev->intr_handle.type =
> RTE_INTR_HANDLE_VFIO_LEGACY;
> - break;
> - default:
> - RTE_LOG(ERR, EAL, "

Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Burakov, Anatoly

On 09-Jul-19 2:30 PM, Burakov, Anatoly wrote:

On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:

-Original Message-
From: Burakov, Anatoly 
Sent: Tuesday, July 9, 2019 5:10 PM
To: Jerin Jacob Kollanukkaran ; David Marchand

Cc: dev ; Thomas Monjalon ; Ben
Walker 
Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
selection



On Mon, Jul 8, 2019 at 4:25 PM  wrote:
From: Jerin Jacob 

Existing logic fails to select IOVA mode as VA if driver request to
enable IOVA as VA.

IOVA as VA has more strict requirement than other modes, so enabling
positive logic for IOVA as VA selection.

This patch also updates the default IOVA mode as PA for PCI devices
as it has to deal with DMA engines unlike the virtual devices that
may need only IOVA as DC.

We have three cases:
- driver/hw supports IOVA as PA only

[Jerin] It is not driver cap, it is more of system cap(IOMMU vs non
IOMMU). We are already addressing that case


I don't get how this works. How does "system capability" affect what
the device itself supports? Are we to assume that *all* hardware
support IOVA as VA by default? "System capability" is more of a bus
issue than an individual device issue, is it not?


What I meant is, supporting VA vs PA is function of IOMMU(not the 
device

attribute).

Ie. Device makes the  bus master request, if IOMMU available and
enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA to

physical address.


Another way to put is, Is there any _PCIe_ device which need/requires
RTE_PCI_DRV_NEED_IOVA_AS_PA in rte_pci_driver.drv_flags




Previously, as far as i can tell, the flag was used to indicate 
support for IOVA

as VA mode, not *requirement* for IOVA as VA mode. For example, there
are multiple patches [1][2][3][4] (i'm sure i can find more!) that 
added IOVA
as VA support to various drivers, and they all were worded it in this 
exact way
- "support for IOVA as VA mode", not "require IOVA as VA mode". As 
far as i
can tell, none of these drivers *require* IOVA as VA mode - they 
merely use

this flag to indicate support for it.


Some class of devices NEED IOVA as VA for performance reasons.
Specially the devices has HW mempool allocators. On those devices If 
we don’t use IOVA as VA,
Upon getting packet from device, It needs to go over 
rte_mem_iova2virt() per

packet see driver/net/dppa2. Which has real performance issue.


I wouldn't classify this as "needing" IOVA. "Need" implies it cannot 
work without it, whereas in this case it's more of a "highly 
recommended" rather than "need".




Now suddenly it turns out that someone somewhere "knew" that "IOVA as
VA" flag in PCI drivers is supposed to indicate *requirement* and not
support, and it appears that this knowledge was not communicated nor
documented anywhere, and is now treated as common knowledge.


I think, the confusion here is,  I was under impression that
# If device supports IOVA as VA and system runs with IOMMU then
the  dpdk should run in IOVA as VA mode.
If above statement true then we don’t really need a new flag.


Exactly. And the flag used to indicate that the device *supports* IOVA 
as VA, not that it *requires* it.


...unless the driver itself is written in such a way as to simply not 
support VA to PA lookups - in that case, the above suggested way of 
simply not indicating IOVA as PA support would fix the issue in that it 
will require the device to either work in IOVA as VA mode, or fail to 
initialize. Current semantics of only having one flag do not distinguish 
between "can do both PA and VA" and "can only do VA" - hence the 
suggestion of adding an additional flag indicating IOVA as PA support.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH] net/softnic: fix pipeline time calculation

2019-07-09 Thread Wang, Xiao W



> -Original Message-
> From: Dumitrescu, Cristian
> Sent: Tuesday, July 9, 2019 6:32 PM
> To: Wang, Xiao W ; Singh, Jasvinder
> 
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: RE: [PATCH] net/softnic: fix pipeline time calculation
> 
> 
> 
> > -Original Message-
> > From: Wang, Xiao W
> > Sent: Sunday, June 2, 2019 11:46 AM
> > To: Singh, Jasvinder 
> > Cc: dev@dpdk.org; Dumitrescu, Cristian ;
> > sta...@dpdk.org
> > Subject: RE: [PATCH] net/softnic: fix pipeline time calculation
> >
> >
> >
> > > -Original Message-
> > > From: Singh, Jasvinder
> > > Sent: Friday, May 31, 2019 10:46 PM
> > > To: Wang, Xiao W 
> > > Cc: dev@dpdk.org; Dumitrescu, Cristian ;
> > > sta...@dpdk.org
> > > Subject: RE: [PATCH] net/softnic: fix pipeline time calculation
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Wang, Xiao W
> > > > Sent: Wednesday, May 15, 2019 2:59 PM
> > > > To: Singh, Jasvinder 
> > > > Cc: dev@dpdk.org; Dumitrescu, Cristian
> > ;
> > > > Wang, Xiao W ; sta...@dpdk.org
> > > > Subject: [PATCH] net/softnic: fix pipeline time calculation
> > > >
> > > > When a new pipeline is added to a thread, the "time_next_min" value
> > may
> > > > need update, otherwise this pipeline won't get served timely.
> > > >
> > > > Fixes: 70709c78fda6 ("net/softnic: add command to enable/disable
> > pipeline")
> > > > Cc: sta...@dpdk.org
> > > >
> > > > Signed-off-by: Xiao Wang 
> > > > ---
> > > >  drivers/net/softnic/rte_eth_softnic_thread.c | 6 ++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/drivers/net/softnic/rte_eth_softnic_thread.c
> > > > b/drivers/net/softnic/rte_eth_softnic_thread.c
> > > > index 855408e98..2b482117d 100644
> > > > --- a/drivers/net/softnic/rte_eth_softnic_thread.c
> > > > +++ b/drivers/net/softnic/rte_eth_softnic_thread.c
> > > > @@ -337,6 +337,9 @@ softnic_thread_pipeline_enable(struct
> > > pmd_internals
> > > > *softnic,
> > > > tdp->timer_period = (rte_get_tsc_hz() * p-
> > >timer_period_ms)
> > > /
> > > > 1000;
> > > > tdp->time_next = rte_get_tsc_cycles() + 
> > > > tdp->timer_period;
> > > >
> > > > +   if (tdp->time_next < td->time_next_min)
> > > > +   td->time_next_min = tdp->time_next;
> > > > +
> > > > td->n_pipelines++;
> > > >
> > > > /* Pipeline */
> > > > @@ -522,6 +525,9 @@ thread_msg_handle_pipeline_enable(struct
> > > > softnic_thread_data *t,
> > > > (rte_get_tsc_hz() * 
> > > > req->pipeline_enable.timer_period_ms)
> > /
> > > > 1000;
> > > > p->time_next = rte_get_tsc_cycles() + p->timer_period;
> > > >
> > > > +   if (p->time_next < t->time_next_min)
> > > > +   t->time_next_min = p->time_next;
> > > > +
> > > > t->n_pipelines++;
> > > >
> > > > /* Response */
> > > > --
> > > > 2.15.1
> > >
> > >
> > > Hi Wang,
> > >
> > > Timer values for pipelines and thread level message handlers are already
> > > adjusted in runtime function rte_pmd_softnic_run_internal(). In runtime
> > > function, the values of t->time_next_min is updated as well. IMO, above
> > > changes not needed. Could you help with the case where timer
> > adjustments in
> > > runtime not working?
> >
> > Hi Jasvinder,
> >
> > the values of t->time_next_min is updated only when the pipeline message
> > and thread message get handled, but not when the pipeline is added to
> that
> > thread. E.g. when a thread t->time_next_min is ~100ms later, and a new
> > pipeline is added to that thread with timer_period_ms parameter set to
> > 10ms, then this pipeline's control message will not be served until 100ms
> > later.
> >
> > BRs,
> > Xiao
> >
> > >
> > > Thanks,
> > > Jasvinder
> > >
> 
> NAK
> 
> There are no bugs/issues fixed by this patch, but there are side effects
> introduced that maybe you did not anticipate.
> 
> - Yes, the first message handler for a newly added pipeline might be slightly
> delayed, but this is harmless.

As the above example shows, the new pipeline handling may get ~90ms delayed, and
maybe the user wants this newly added pipeline to be real-time and 90ms delay 
is too long
for that application.

> 
> - For a given thread, we periodically iterate through all pipelines the 
> current
> thread is running and check if there are any pending messages for each
> pipeline (function rte_pmd_softnic_run_internal). We also update the
> deadline for the next message handling session for the thread (thread-
> >time_next_min), which should only be changed by the thread (existing

Yeah it's about (thead->time_next_min), if current time passes it, then we would
Iteratively check each pipepine, only when pipeline's time_next also passes it, 
we
handle this pipeline.

If the thread is running, the softnic_thread_data should only be changed by the
thread itself. And this patch doesn't break this.

> code); if this is changed by the pipeline message handler, then there is the
> risk that s

[dpdk-dev] [PATCH] fbarray: fix fbarray destruction

2019-07-09 Thread Anatoly Burakov
Currently, when fbarray is destroyed, the fbarray structure is not
zeroed out, which leads to stale data being there and confusing
secondary process init in legacy mem mode. Fix it by always
memsetting the fbarray to zero when destroying it.

Fixes: 5b61c62cfd76 ("fbarray: add internal tailq for mapped areas")
Cc: sta...@dpdk.org

Signed-off-by: Anatoly Burakov 
---
 lib/librte_eal/common/eal_common_fbarray.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/librte_eal/common/eal_common_fbarray.c 
b/lib/librte_eal/common/eal_common_fbarray.c
index b7ddb66e9..1312f936b 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -1055,6 +1055,9 @@ rte_fbarray_destroy(struct rte_fbarray *arr)
TAILQ_REMOVE(&mem_area_tailq, tmp, next);
free(tmp);
ret = 0;
+
+   /* reset the fbarray structure */
+   memset(arr, 0, sizeof(*arr));
 out:
rte_spinlock_unlock(&mem_area_lock);
return ret;
-- 
2.17.1


Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Jerin Jacob Kollanukkaran
> -Original Message-
> From: Burakov, Anatoly 
> Sent: Tuesday, July 9, 2019 7:00 PM
> To: Jerin Jacob Kollanukkaran ; David Marchand
> 
> Cc: dev ; Thomas Monjalon ; Ben
> Walker 
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> selection
> 
> On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:
> >> -Original Message-
> >> From: Burakov, Anatoly 
> >> Sent: Tuesday, July 9, 2019 5:10 PM
> >> To: Jerin Jacob Kollanukkaran ; David Marchand
> >> 
> >> Cc: dev ; Thomas Monjalon ;
> Ben
> >> Walker 
> >> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
> >> mode selection
> > 
> >
> > On Mon, Jul 8, 2019 at 4:25 PM  wrote:
> > From: Jerin Jacob 
> >
> > Existing logic fails to select IOVA mode as VA if driver request
> > to enable IOVA as VA.
> >
> > IOVA as VA has more strict requirement than other modes, so
> > enabling positive logic for IOVA as VA selection.
> >
> > This patch also updates the default IOVA mode as PA for PCI
> > devices as it has to deal with DMA engines unlike the virtual
> > devices that may need only IOVA as DC.
> >
> > We have three cases:
> > - driver/hw supports IOVA as PA only
> >
> > [Jerin] It is not driver cap, it is more of system cap(IOMMU vs
> > non IOMMU). We are already addressing that case
> 
>  I don't get how this works. How does "system capability" affect
>  what the device itself supports? Are we to assume that *all*
>  hardware support IOVA as VA by default? "System capability" is more
>  of a bus issue than an individual device issue, is it not?
> >>>
> >>> What I meant is, supporting VA vs PA is function of IOMMU(not the
> >>> device
> >> attribute).
> >>> Ie. Device makes the  bus master request, if IOMMU available and
> >>> enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA
> >>> to
> >> physical address.
> >>>
> >>> Another way to put is, Is there any _PCIe_ device which
> >>> need/requires RTE_PCI_DRV_NEED_IOVA_AS_PA in
> >>> rte_pci_driver.drv_flags
> >>>
> >>>
> >>
> >> Previously, as far as i can tell, the flag was used to indicate
> >> support for IOVA as VA mode, not *requirement* for IOVA as VA mode.
> >> For example, there are multiple patches [1][2][3][4] (i'm sure i can
> >> find more!) that added IOVA as VA support to various drivers, and
> >> they all were worded it in this exact way
> >> - "support for IOVA as VA mode", not "require IOVA as VA mode". As
> >> far as i can tell, none of these drivers *require* IOVA as VA mode -
> >> they merely use this flag to indicate support for it.
> >
> > Some class of devices NEED IOVA as VA for performance reasons.
> > Specially the devices has HW mempool allocators. On those devices If
> > we don’t use IOVA as VA, Upon getting packet from device, It needs to
> > go over rte_mem_iova2virt() per packet see driver/net/dppa2. Which has
> real performance issue.
> 
> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot work
> without it, whereas in this case it's more of a "highly recommended" rather
> than "need".

It is "need" as performance is horrible without it as is per packet SW 
translation.
A "need" for DPDK performance perspective.

> 
> >>
> >> Now suddenly it turns out that someone somewhere "knew" that "IOVA
> as
> >> VA" flag in PCI drivers is supposed to indicate *requirement* and not
> >> support, and it appears that this knowledge was not communicated nor
> >> documented anywhere, and is now treated as common knowledge.
> >
> > I think, the confusion here is,  I was under impression that # If
> > device supports IOVA as VA and system runs with IOMMU then the  dpdk
> > should run in IOVA as VA mode.
> > If above statement true then we don’t really need a new flag.
> 
> Exactly. And the flag used to indicate that the device *supports* IOVA as VA,
> not that it *requires* it.
> 
> >
> > Couple of points to make forward progress:
> > # If we think, there is a use case where device is IOVA as VA And
> > system runs in IOMMU mode then for some reason DPDK needs to run in
> PA
> > mode. If so, we need to create two flags RTE_PCI_DRV_IOVA_AS_VA - it
> > can run either modes
> 
> There are use cases - KNI and igb_uio come to mind. Whether IOMMU uses
> VA or PA is a different from whether IOMMU is in use - there is no law that
> states that, when using IOMMU, IOVA have to have 1:1 mapping with VA.
> IOMMU requirement does not necessarily imply IOVA as VA - it is perfectly
> legal to program IOMMU to use IOVA as PA (which we currently do when we
> e.g. use VFIO for some devices and igb_uio for others).

For KNI, we already submitted a patch to support IOVA as VA.
I don’t know about igb_uio, if IOVA as PA, we might as well disable IOMMU.
Is igb_uio enables IOMMU? I don’t see any reference.
grep -ri "iommu" kernel/linux/igb_uio/


Re: [dpdk-dev] [PATCH] net/softnic: fix pipeline time calculation

2019-07-09 Thread Dumitrescu, Cristian



> -Original Message-
> From: Wang, Xiao W
> Sent: Tuesday, July 9, 2019 3:00 PM
> To: Dumitrescu, Cristian ; Singh, Jasvinder
> 
> Cc: dev@dpdk.org; sta...@dpdk.org
> Subject: RE: [PATCH] net/softnic: fix pipeline time calculation
> 
> 
> 
> > -Original Message-
> > From: Dumitrescu, Cristian
> > Sent: Tuesday, July 9, 2019 6:32 PM
> > To: Wang, Xiao W ; Singh, Jasvinder
> > 
> > Cc: dev@dpdk.org; sta...@dpdk.org
> > Subject: RE: [PATCH] net/softnic: fix pipeline time calculation
> >
> >
> >
> > > -Original Message-
> > > From: Wang, Xiao W
> > > Sent: Sunday, June 2, 2019 11:46 AM
> > > To: Singh, Jasvinder 
> > > Cc: dev@dpdk.org; Dumitrescu, Cristian
> ;
> > > sta...@dpdk.org
> > > Subject: RE: [PATCH] net/softnic: fix pipeline time calculation
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Singh, Jasvinder
> > > > Sent: Friday, May 31, 2019 10:46 PM
> > > > To: Wang, Xiao W 
> > > > Cc: dev@dpdk.org; Dumitrescu, Cristian
> ;
> > > > sta...@dpdk.org
> > > > Subject: RE: [PATCH] net/softnic: fix pipeline time calculation
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Wang, Xiao W
> > > > > Sent: Wednesday, May 15, 2019 2:59 PM
> > > > > To: Singh, Jasvinder 
> > > > > Cc: dev@dpdk.org; Dumitrescu, Cristian
> > > ;
> > > > > Wang, Xiao W ; sta...@dpdk.org
> > > > > Subject: [PATCH] net/softnic: fix pipeline time calculation
> > > > >
> > > > > When a new pipeline is added to a thread, the "time_next_min"
> value
> > > may
> > > > > need update, otherwise this pipeline won't get served timely.
> > > > >
> > > > > Fixes: 70709c78fda6 ("net/softnic: add command to enable/disable
> > > pipeline")
> > > > > Cc: sta...@dpdk.org
> > > > >
> > > > > Signed-off-by: Xiao Wang 
> > > > > ---
> > > > >  drivers/net/softnic/rte_eth_softnic_thread.c | 6 ++
> > > > >  1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/drivers/net/softnic/rte_eth_softnic_thread.c
> > > > > b/drivers/net/softnic/rte_eth_softnic_thread.c
> > > > > index 855408e98..2b482117d 100644
> > > > > --- a/drivers/net/softnic/rte_eth_softnic_thread.c
> > > > > +++ b/drivers/net/softnic/rte_eth_softnic_thread.c
> > > > > @@ -337,6 +337,9 @@ softnic_thread_pipeline_enable(struct
> > > > pmd_internals
> > > > > *softnic,
> > > > >   tdp->timer_period = (rte_get_tsc_hz() * p-
> > > >timer_period_ms)
> > > > /
> > > > > 1000;
> > > > >   tdp->time_next = rte_get_tsc_cycles() + tdp-
> >timer_period;
> > > > >
> > > > > + if (tdp->time_next < td->time_next_min)
> > > > > + td->time_next_min = tdp->time_next;
> > > > > +
> > > > >   td->n_pipelines++;
> > > > >
> > > > >   /* Pipeline */
> > > > > @@ -522,6 +525,9 @@ thread_msg_handle_pipeline_enable(struct
> > > > > softnic_thread_data *t,
> > > > >   (rte_get_tsc_hz() * req-
> >pipeline_enable.timer_period_ms)
> > > /
> > > > > 1000;
> > > > >   p->time_next = rte_get_tsc_cycles() + p->timer_period;
> > > > >
> > > > > + if (p->time_next < t->time_next_min)
> > > > > + t->time_next_min = p->time_next;
> > > > > +
> > > > >   t->n_pipelines++;
> > > > >
> > > > >   /* Response */
> > > > > --
> > > > > 2.15.1
> > > >
> > > >
> > > > Hi Wang,
> > > >
> > > > Timer values for pipelines and thread level message handlers are
> already
> > > > adjusted in runtime function rte_pmd_softnic_run_internal(). In
> runtime
> > > > function, the values of t->time_next_min is updated as well. IMO,
> above
> > > > changes not needed. Could you help with the case where timer
> > > adjustments in
> > > > runtime not working?
> > >
> > > Hi Jasvinder,
> > >
> > > the values of t->time_next_min is updated only when the pipeline
> message
> > > and thread message get handled, but not when the pipeline is added to
> > that
> > > thread. E.g. when a thread t->time_next_min is ~100ms later, and a new
> > > pipeline is added to that thread with timer_period_ms parameter set to
> > > 10ms, then this pipeline's control message will not be served until 100ms
> > > later.
> > >
> > > BRs,
> > > Xiao
> > >
> > > >
> > > > Thanks,
> > > > Jasvinder
> > > >
> >
> > NAK
> >
> > There are no bugs/issues fixed by this patch, but there are side effects
> > introduced that maybe you did not anticipate.
> >
> > - Yes, the first message handler for a newly added pipeline might be 
> > slightly
> > delayed, but this is harmless.
> 
> As the above example shows, the new pipeline handling may get ~90ms
> delayed, and
> maybe the user wants this newly added pipeline to be real-time and 90ms
> delay is too long
> for that application.
> 

The example you provide is probably not realistic, as typically the thread 
period should be one order of magnitude smaller than the pipeline period, while 
your example is the other way around.

If the pipeline period is smaller than the thread period, the pipeline period 
is essentially upgraded to th

Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Jerin Jacob Kollanukkaran


> -Original Message-
> From: Burakov, Anatoly 
> Sent: Tuesday, July 9, 2019 7:21 PM
> To: Jerin Jacob Kollanukkaran ; David Marchand
> 
> Cc: dev ; Thomas Monjalon ; Ben
> Walker 
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode
> selection
> 
> On 09-Jul-19 2:30 PM, Burakov, Anatoly wrote:
> > On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:
> >>> -Original Message-
> >>> From: Burakov, Anatoly 
> >>> Sent: Tuesday, July 9, 2019 5:10 PM
> >>> To: Jerin Jacob Kollanukkaran ; David Marchand
> >>> 
> >>> Cc: dev ; Thomas Monjalon ;
> Ben
> >>> Walker 
> >>> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
> >>> mode selection
> >> 
> >>
> >> On Mon, Jul 8, 2019 at 4:25 PM  wrote:
> >> From: Jerin Jacob 
> >>
> >> Existing logic fails to select IOVA mode as VA if driver request
> >> to enable IOVA as VA.
> >>
> >> IOVA as VA has more strict requirement than other modes, so
> >> enabling positive logic for IOVA as VA selection.
> >>
> >> This patch also updates the default IOVA mode as PA for PCI
> >> devices as it has to deal with DMA engines unlike the virtual
> >> devices that may need only IOVA as DC.
> >>
> >> We have three cases:
> >> - driver/hw supports IOVA as PA only
> >>
> >> [Jerin] It is not driver cap, it is more of system cap(IOMMU vs
> >> non IOMMU). We are already addressing that case
> >
> > I don't get how this works. How does "system capability" affect
> > what the device itself supports? Are we to assume that *all*
> > hardware support IOVA as VA by default? "System capability" is
> > more of a bus issue than an individual device issue, is it not?
> 
>  What I meant is, supporting VA vs PA is function of IOMMU(not the
>  device
> >>> attribute).
>  Ie. Device makes the  bus master request, if IOMMU available and
>  enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA
>  to
> >>> physical address.
> 
>  Another way to put is, Is there any _PCIe_ device which
>  need/requires RTE_PCI_DRV_NEED_IOVA_AS_PA in
>  rte_pci_driver.drv_flags
> 
> 
> >>>
> >>> Previously, as far as i can tell, the flag was used to indicate
> >>> support for IOVA as VA mode, not *requirement* for IOVA as VA mode.
> >>> For example, there are multiple patches [1][2][3][4] (i'm sure i can
> >>> find more!) that added IOVA as VA support to various drivers, and
> >>> they all were worded it in this exact way
> >>> - "support for IOVA as VA mode", not "require IOVA as VA mode". As
> >>> far as i can tell, none of these drivers *require* IOVA as VA mode -
> >>> they merely use this flag to indicate support for it.
> >>
> >> Some class of devices NEED IOVA as VA for performance reasons.
> >> Specially the devices has HW mempool allocators. On those devices If
> >> we don’t use IOVA as VA, Upon getting packet from device, It needs to
> >> go over
> >> rte_mem_iova2virt() per
> >> packet see driver/net/dppa2. Which has real performance issue.
> >
> > I wouldn't classify this as "needing" IOVA. "Need" implies it cannot
> > work without it, whereas in this case it's more of a "highly
> > recommended" rather than "need".
> >
> >>>
> >>> Now suddenly it turns out that someone somewhere "knew" that "IOVA
> >>> as VA" flag in PCI drivers is supposed to indicate *requirement* and
> >>> not support, and it appears that this knowledge was not communicated
> >>> nor documented anywhere, and is now treated as common knowledge.
> >>
> >> I think, the confusion here is,  I was under impression that # If
> >> device supports IOVA as VA and system runs with IOMMU then the  dpdk
> >> should run in IOVA as VA mode.
> >> If above statement true then we don’t really need a new flag.
> >
> > Exactly. And the flag used to indicate that the device *supports* IOVA
> > as VA, not that it *requires* it.
> 
> ...unless the driver itself is written in such a way as to simply not support 
> VA
> to PA lookups

Yes. 

> - in that case, the above suggested way of simply not indicating
> IOVA as PA support would fix the issue in that it will require the device to
> either work in IOVA as VA mode, or fail to initialize. Current semantics of 
> only
> having one flag do not distinguish between "can do both PA and VA" and
> "can only do VA" - hence the suggestion of adding an additional flag
> indicating IOVA as PA support.

Currently all device can support "can do both PA and VA" but system limits 
through vfio-nommu or
Igb_uio or KNI it fallback to PA

So question comes what we do with new flag in pci_device_iova_mode()
In my view:

pci_device_iova_mode() can return RTE_IOVA_PA as default for PCI device.
if PCIe device supports IOVA_AS_VA,  pci_device_iova_mode() needs to return 
RTE_IOVA_VA if "SYSTEM" supports it

In this context, What will

Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Burakov, Anatoly

On 09-Jul-19 3:00 PM, Jerin Jacob Kollanukkaran wrote:

-Original Message-
From: Burakov, Anatoly 
Sent: Tuesday, July 9, 2019 7:00 PM
To: Jerin Jacob Kollanukkaran ; David Marchand

Cc: dev ; Thomas Monjalon ; Ben
Walker 
Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
selection

On 09-Jul-19 1:11 PM, Jerin Jacob Kollanukkaran wrote:

-Original Message-
From: Burakov, Anatoly 
Sent: Tuesday, July 9, 2019 5:10 PM
To: Jerin Jacob Kollanukkaran ; David Marchand

Cc: dev ; Thomas Monjalon ;

Ben

Walker 
Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
mode selection



On Mon, Jul 8, 2019 at 4:25 PM  wrote:
From: Jerin Jacob 

Existing logic fails to select IOVA mode as VA if driver request
to enable IOVA as VA.

IOVA as VA has more strict requirement than other modes, so
enabling positive logic for IOVA as VA selection.

This patch also updates the default IOVA mode as PA for PCI
devices as it has to deal with DMA engines unlike the virtual
devices that may need only IOVA as DC.

We have three cases:
- driver/hw supports IOVA as PA only

[Jerin] It is not driver cap, it is more of system cap(IOMMU vs
non IOMMU). We are already addressing that case


I don't get how this works. How does "system capability" affect
what the device itself supports? Are we to assume that *all*
hardware support IOVA as VA by default? "System capability" is more
of a bus issue than an individual device issue, is it not?


What I meant is, supporting VA vs PA is function of IOMMU(not the
device

attribute).

Ie. Device makes the  bus master request, if IOMMU available and
enabled in the SYSTEM , It goes over IOMMU  and translate the IOVA
to

physical address.


Another way to put is, Is there any _PCIe_ device which
need/requires RTE_PCI_DRV_NEED_IOVA_AS_PA in
rte_pci_driver.drv_flags




Previously, as far as i can tell, the flag was used to indicate
support for IOVA as VA mode, not *requirement* for IOVA as VA mode.
For example, there are multiple patches [1][2][3][4] (i'm sure i can
find more!) that added IOVA as VA support to various drivers, and
they all were worded it in this exact way
- "support for IOVA as VA mode", not "require IOVA as VA mode". As
far as i can tell, none of these drivers *require* IOVA as VA mode -
they merely use this flag to indicate support for it.


Some class of devices NEED IOVA as VA for performance reasons.
Specially the devices has HW mempool allocators. On those devices If
we don’t use IOVA as VA, Upon getting packet from device, It needs to
go over rte_mem_iova2virt() per packet see driver/net/dppa2. Which has

real performance issue.

I wouldn't classify this as "needing" IOVA. "Need" implies it cannot work
without it, whereas in this case it's more of a "highly recommended" rather
than "need".


It is "need" as performance is horrible without it as is per packet SW 
translation.
A "need" for DPDK performance perspective.


Would the driver fail to initialize if it detects running as IOVA as PA?







Now suddenly it turns out that someone somewhere "knew" that "IOVA

as

VA" flag in PCI drivers is supposed to indicate *requirement* and not
support, and it appears that this knowledge was not communicated nor
documented anywhere, and is now treated as common knowledge.


I think, the confusion here is,  I was under impression that # If
device supports IOVA as VA and system runs with IOMMU then the  dpdk
should run in IOVA as VA mode.
If above statement true then we don’t really need a new flag.


Exactly. And the flag used to indicate that the device *supports* IOVA as VA,
not that it *requires* it.



Couple of points to make forward progress:
# If we think, there is a use case where device is IOVA as VA And
system runs in IOMMU mode then for some reason DPDK needs to run in

PA

mode. If so, we need to create two flags RTE_PCI_DRV_IOVA_AS_VA - it
can run either modes


There are use cases - KNI and igb_uio come to mind. Whether IOMMU uses
VA or PA is a different from whether IOMMU is in use - there is no law that
states that, when using IOMMU, IOVA have to have 1:1 mapping with VA.
IOMMU requirement does not necessarily imply IOVA as VA - it is perfectly
legal to program IOMMU to use IOVA as PA (which we currently do when we
e.g. use VFIO for some devices and igb_uio for others).


For KNI, we already submitted a patch to support IOVA as VA.


Yep, point being that it *didn't work* before, hence we may want to 
account for possible future use cases like this (however admittedly 
hacky they are). There are valid use cases to enforce IOVA as VA support 
only (such as for performance reasons, even though it would be 
technically possible to use IOVA as PA), and there could be valid use 
cases to enforce IOVA as PA support only (for example, i seem to 
remember that crypto/qat driver at one point didn't support VFIO driver, 
effectively rendering 

Re: [dpdk-dev] [PATCH] Revert "net/mlx: support IOVA VA mode"

2019-07-09 Thread Stephen Hemminger
On Fri,  7 Jun 2019 16:08:41 -0700
Stephen Hemminger  wrote:

> From: Stephen Hemminger 
> 
> This reverts commit 69c06d0e357ed0064b498d510d169603cf7308cd.
> That commit breaks support for netvsc PMD with MLX SRIOV
> on both Hyper-V and Azure.
> 
> Signed-off-by: Stephen Hemminger 

DPDK 19.08-rc is broken on Azure. This patch which seems to have been
ignored fixes it.


Re: [dpdk-dev] [PATCH] doc: update release notes for OCTEON TX2 support

2019-07-09 Thread Thomas Monjalon
09/07/2019 15:35, jer...@marvell.com:
> Update release notes for various OCTEON TX2 drivers supported for 19.08.

Sorry Jerin,
I really forgot to mention OCTEON TX2 in the -rc1 announce.

2 advices to avoid such miss:
- update release notes with code patches
- announce features in a roadmap email





Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Burakov, Anatoly

On 09-Jul-19 3:00 PM, Jerin Jacob Kollanukkaran wrote:

-Original Message-
From: Burakov, Anatoly 
Sent: Tuesday, July 9, 2019 7:00 PM
To: Jerin Jacob Kollanukkaran ; David Marchand

Cc: dev ; Thomas Monjalon ; Ben
Walker 
Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
selection








# With top of tree, Currently it never runs in IOVA as VA mode.
That’s a separate problem to fix. Which effect all the devices
Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though Device
support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
protection and/or root privilege is required to run DPDK.




By the way, there seems to be some confusion here. IOVA as PA mode does 
*not* imply running without IOMMU protection. If IOVA as PA mode is 
used, it would require root privileges (to get physical addresses), but 
the IOMMU protection is still enabled. IOMMU doesn't care what you set 
up your addresses as, and the fact that they're 1:1 PA addresses doesn't 
mean IOMMU is not engaged.


--
Thanks,
Anatoly


Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Jerin Jacob Kollanukkaran
> -Original Message-
> From: Burakov, Anatoly 
> Sent: Tuesday, July 9, 2019 8:24 PM
> To: Jerin Jacob Kollanukkaran ; David Marchand
> 
> Cc: dev ; Thomas Monjalon ; Ben
> Walker 
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> selection
> 
> On 09-Jul-19 3:00 PM, Jerin Jacob Kollanukkaran wrote:
> >> -Original Message-
> >> From: Burakov, Anatoly 
> >> Sent: Tuesday, July 9, 2019 7:00 PM
> >> To: Jerin Jacob Kollanukkaran ; David Marchand
> >> 
> >> Cc: dev ; Thomas Monjalon ;
> Ben
> >> Walker 
> >> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
> >> mode selection
> >>
> 
> 
> 
> >>
> >>> # With top of tree, Currently it never runs in IOVA as VA mode.
> >>> That’s a separate problem to fix. Which effect all the devices
> >>> Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though
> Device
> >>> support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
> >>> protection and/or root privilege is required to run DPDK.
> >
> 
> By the way, there seems to be some confusion here. IOVA as PA mode does
> *not* imply running without IOMMU protection. If IOVA as PA mode is used,
> it would require root privileges (to get physical addresses), but the IOMMU
> protection is still enabled. IOMMU doesn't care what you set up your

Yes. It was thinking more  of VFIO perspective. Not igb_uio.


> addresses as, and the fact that they're 1:1 PA addresses doesn't mean
> IOMMU is not engaged.



> 
> --
> Thanks,
> Anatoly


[dpdk-dev] [PATCH] doc: deprecation notice for change of ether struct alignment

2019-07-09 Thread Bruce Richardson
The ethernet address structure alignment will be changed to 2B alignment in
19.11. Flag this to users.

Impact is expected to be minimal for this change since ethernet addresses
are generally 2B aligned anyway.

Signed-off-by: Bruce Richardson 
---
 doc/guides/rel_notes/deprecation.rst | 5 +
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index a7796f49b..7a04b9b09 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,6 +11,11 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 ---
 
+* net: The alignment of the ethernet address structure in DPDK,
+  ``rte_ether_addr`` will be increased to two, i.e. 16-bit aligned, in the
+  19.11 release. This will only affect any data structures where an ethernet
+  address is currently aligned on a single byte boundary.
+
 * meson: The minimum supported version of meson for configuring and building
   DPDK will be increased to v0.47.1 (from 0.41) from DPDK 19.05 onwards. For
   those users with a version earlier than 0.47.1, an updated copy of meson
-- 
2.21.0



Re: [dpdk-dev] [PATCH] doc: deprecation notice for change of ether struct alignment

2019-07-09 Thread Stephen Hemminger
On Tue,  9 Jul 2019 15:58:36 +0100
Bruce Richardson  wrote:

> The ethernet address structure alignment will be changed to 2B alignment in
> 19.11. Flag this to users.
> 
> Impact is expected to be minimal for this change since ethernet addresses
> are generally 2B aligned anyway.
> 
> Signed-off-by: Bruce Richardson 
> ---
>  doc/guides/rel_notes/deprecation.rst | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst 
> b/doc/guides/rel_notes/deprecation.rst
> index a7796f49b..7a04b9b09 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -11,6 +11,11 @@ API and ABI deprecation notices are to be posted here.
>  Deprecation Notices
>  ---
>  
> +* net: The alignment of the ethernet address structure in DPDK,
> +  ``rte_ether_addr`` will be increased to two, i.e. 16-bit aligned, in the
> +  19.11 release. This will only affect any data structures where an ethernet
> +  address is currently aligned on a single byte boundary.
> +
>  * meson: The minimum supported version of meson for configuring and building
>DPDK will be increased to v0.47.1 (from 0.41) from DPDK 19.05 onwards. For
>those users with a version earlier than 0.47.1, an updated copy of meson

We should also take the packed off of ether_header and ether_addr.
Packed is meaningless on ether_addr anyway.


Re: [dpdk-dev] [PATCH v9 00/11] ether: improvements and optimizations

2019-07-09 Thread Bruce Richardson
On Mon, Jul 08, 2019 at 08:13:06PM +0100, Ferruh Yigit wrote:
> On 7/8/2019 7:26 PM, Stephen Hemminger wrote:
> > This is a collection of patches around the ethernet address
> > manipulation routines in librte_net/rte_ether.
> > 
> > v9
> >add missing librte_net for new octoeonx2
> > v8
> >set rte_errno in rte_eth_unformat_addr
> >drop ether_address alignment patch. Bruce can handle deprecation
> >and sending the patches later
> > v7
> >use rte_ether_unformat_addr in more drivers
> > v6
> >add librte_net to axgbe and memif Makefile
> > v5
> >reword commit messages to workaround check-log complaints
> > v4
> >fix meson build
> >reword commit messages
> >add bonding and tespmd patches
> > v3 
> >rebase to use rte_ether prefix
> >drop aligning ethernet headers for now.
> > 
> > Stephen Hemminger (11):
> >   net/ether: deinline non-critical functions
> >   net/ether: add function to convert string to ethernet address
> >   ethdev: use new ethernet parsing function
> >   net/ether: use bitops to speedup comparison
> >   cmdline: use new ethernet address parser
> >   net/bonding: use new ethernet address parser
> >   app/testpmd: use new ethernet address parser
> >   net/virtio: use new ether addr parsing
> >   net/failsafe: use common ether address parsing routine
> >   net/vdev_netvsc: use common ethernet address parsing
> >   net/memif: use common ethernet address parsing routine
> 
> For series,
> Reviewed-by: Ferruh Yigit 
> 
> Series applied to dpdk-next-net/master, thanks.

Deprecation notice for dropped patch:

http://patches.dpdk.org/patch/56271/


Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Burakov, Anatoly

On 09-Jul-19 3:58 PM, Jerin Jacob Kollanukkaran wrote:

-Original Message-
From: Burakov, Anatoly 
Sent: Tuesday, July 9, 2019 8:24 PM
To: Jerin Jacob Kollanukkaran ; David Marchand

Cc: dev ; Thomas Monjalon ; Ben
Walker 
Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
selection

On 09-Jul-19 3:00 PM, Jerin Jacob Kollanukkaran wrote:

-Original Message-
From: Burakov, Anatoly 
Sent: Tuesday, July 9, 2019 7:00 PM
To: Jerin Jacob Kollanukkaran ; David Marchand

Cc: dev ; Thomas Monjalon ;

Ben

Walker 
Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA
mode selection








# With top of tree, Currently it never runs in IOVA as VA mode.
That’s a separate problem to fix. Which effect all the devices
Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though

Device

support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
protection and/or root privilege is required to run DPDK.




By the way, there seems to be some confusion here. IOVA as PA mode does
*not* imply running without IOMMU protection. If IOVA as PA mode is used,
it would require root privileges (to get physical addresses), but the IOMMU
protection is still enabled. IOMMU doesn't care what you set up your


Yes. It was thinking more  of VFIO perspective. Not igb_uio.



It is the same for both.

When IOMMU is fully enabled (iommu=on at boot time), igb_uio will simply 
not work. VFIO will work, whichever address mode you use.


When IOMMU is in pass-through mode (iommu=pt at boot time), both igb_uio 
and VFIO will work, although igb_uio will only support IOVA as PA mode. 
Both modes will enable IOMMU, and both can run in IOVA as PA mode 
without losing that protection.


It's only when IOMMU is off, igb_uio will not engage IOMMU, and VFIO 
will only work in no-IOMMU mode (thus not engaging IOMMU either), and 
only then you lack the IOMMU protection.


--
Thanks,
Anatoly


Re: [dpdk-dev] [PATCH] doc: deprecation notice for change of ether struct alignment

2019-07-09 Thread Bruce Richardson
On Tue, Jul 09, 2019 at 07:59:53AM -0700, Stephen Hemminger wrote:
> On Tue,  9 Jul 2019 15:58:36 +0100
> Bruce Richardson  wrote:
> 
> > The ethernet address structure alignment will be changed to 2B alignment in
> > 19.11. Flag this to users.
> > 
> > Impact is expected to be minimal for this change since ethernet addresses
> > are generally 2B aligned anyway.
> > 
> > Signed-off-by: Bruce Richardson 
> > ---
> >  doc/guides/rel_notes/deprecation.rst | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/doc/guides/rel_notes/deprecation.rst 
> > b/doc/guides/rel_notes/deprecation.rst
> > index a7796f49b..7a04b9b09 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -11,6 +11,11 @@ API and ABI deprecation notices are to be posted here.
> >  Deprecation Notices
> >  ---
> >  
> > +* net: The alignment of the ethernet address structure in DPDK,
> > +  ``rte_ether_addr`` will be increased to two, i.e. 16-bit aligned, in the
> > +  19.11 release. This will only affect any data structures where an 
> > ethernet
> > +  address is currently aligned on a single byte boundary.
> > +
> >  * meson: The minimum supported version of meson for configuring and 
> > building
> >DPDK will be increased to v0.47.1 (from 0.41) from DPDK 19.05 onwards. 
> > For
> >those users with a version earlier than 0.47.1, an updated copy of meson
> 
> We should also take the packed off of ether_header and ether_addr.
> Packed is meaningless on ether_addr anyway.

Agreed. However, I don't think that needs to be called out here as it's a
code change with zero impact. The user-impacting change is the alignment
one.

/Bruce


Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Thomas Monjalon
09/07/2019 16:37, Burakov, Anatoly:
> My view would be to always run in IOVA as VA by default and only falling 
> back to IOVA as PA if there is a need to do that. Yet, it seems that 
> whenever i try to bring this up, the response (not necessarily from you, 
> so this is not directed at you specifically) seems to be that because of 
> hotplug, we have to start in the "safest" (from device support point of 
> view) mode - that is, in IOVA as PA. Seeing how, as you claim, some 
> devices require IOVA as VA, then IOVA as PA is no longer the "safe" 
> default that all devices will support. Perhaps we can use this 
> opportunity to finally make IOVA as VA the default :)

That's a good point Anatoly. We need to decide what is the safest default.

About the capabilities flags, please let's agree that we want
to express 3 cases, so we need 2 flags.
About the preference of a mode for a device, if a mode is really
bad for a device, I suggest to not advertise it in capabilities.

In order to take a better decision, we need a summary of the
decision algorithm per layer, involving kernel driver capabilities
and memory capabilities.




Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Burakov, Anatoly

On 09-Jul-19 4:04 PM, Thomas Monjalon wrote:

09/07/2019 16:37, Burakov, Anatoly:

My view would be to always run in IOVA as VA by default and only falling
back to IOVA as PA if there is a need to do that. Yet, it seems that
whenever i try to bring this up, the response (not necessarily from you,
so this is not directed at you specifically) seems to be that because of
hotplug, we have to start in the "safest" (from device support point of
view) mode - that is, in IOVA as PA. Seeing how, as you claim, some
devices require IOVA as VA, then IOVA as PA is no longer the "safe"
default that all devices will support. Perhaps we can use this
opportunity to finally make IOVA as VA the default :)


That's a good point Anatoly. We need to decide what is the safest default.

About the capabilities flags, please let's agree that we want
to express 3 cases, so we need 2 flags.


We do agree on this.


About the preference of a mode for a device, if a mode is really
bad for a device, I suggest to not advertise it in capabilities.


Yes, agree with that as well.



In order to take a better decision, we need a summary of the
decision algorithm per layer, involving kernel driver capabilities
and memory capabilities.



This needs to be documented very well, as this seems to be a source of 
great confusion for us all. If all was spelled out in the docs, we 
wouldn't need this long discussion to figure out that we actually agree :D


--
Thanks,
Anatoly


[dpdk-dev] [PATCH v5] power: add fifo per core for JSON interface

2019-07-09 Thread David Hunt
From: Marcin Hajkowski 

This patch implements a separate FIFO for each cpu core to improve the
previous functionality where anyone with access to the FIFO could affect
any core on the system. By using appropriate permissions, fifo interfaces
can be configured to only affect the particular cores.

Because each FIFO is per core, the following fields have been removed
from the command JSON format: core_list, resource_id, name.

Signed-off-by: Lukasz Krakowiak 
Signed-off-by: Lukasz Gosiewski 
Signed-off-by: Marcin Hajkowski 
Tested-by: David Hunt 

---
v2:
* updated handling vm_name (use proper buff size)
* rebase to master changes

v3:
* improvement to coding style

v4:
* rebase to tip of master

v5:
* merged docs into same patch as the code, as per mailing list policy
* made changes out of review by Anatoly.
---
 .../sample_app_ug/vm_power_management.rst | 61 +++-
 examples/vm_power_manager/channel_manager.c   | 90 +-
 examples/vm_power_manager/channel_manager.h   |  7 +-
 examples/vm_power_manager/channel_monitor.c   | 92 +--
 examples/vm_power_manager/main.c  |  2 +-
 5 files changed, 150 insertions(+), 102 deletions(-)

diff --git a/doc/guides/sample_app_ug/vm_power_management.rst 
b/doc/guides/sample_app_ug/vm_power_management.rst
index 5d9a26172..0835e 100644
--- a/doc/guides/sample_app_ug/vm_power_management.rst
+++ b/doc/guides/sample_app_ug/vm_power_management.rst
@@ -380,9 +380,16 @@ parsing functionality will not be present in the app.
 
 Sending a command or policy to the power manager application is achieved by
 simply opening a fifo file, writing a JSON string to that fifo, and closing
-the file.
+the file. In actual implementation every core has own dedicated fifo[0..n],
+where n is number of the last available core.
+Having a dedicated fifo file per core allows using standard filesystem 
permissions
+to ensure a given container can only write JSON commands into fifos it is 
allowed
+to use.
 
-The fifo is at /tmp/powermonitor/fifo
+The fifo is at /tmp/powermonitor/fifo[0..n]
+
+For example all cmds put to the /tmp/powermonitor/fifo7, will have
+effect only on CPU[7].
 
 The JSON string can be a policy or instruction, and takes the following
 format:
@@ -405,19 +412,6 @@ arrays, etc. Examples of policies follow later in this 
document. The allowed
 names and value types are as follows:
 
 
-:Pair Name: "name"
-:Description: Name of the VM or Host. Allows the parser to associate the
-  policy with the relevant VM or Host OS.
-:Type: string
-:Values: any valid string
-:Required: yes
-:Example:
-
-.. code-block:: javascript
-
-  "name", "ubuntu2"
-
-
 :Pair Name: "command"
 :Description: The type of packet we're sending to the power manager. We can be
   creating or destroying a policy, or sending a direct command to adjust
@@ -509,17 +503,6 @@ names and value types are as follows:
 
 "max_packet_thresh": 50
 
-:Pair Name: "core_list"
-:Description: The cores to which to apply the policy.
-:Type: array of integers
-:Values: array with list of virtual CPUs.
-:Required: only policy CREATE/DESTROY
-:Example:
-
-  .. code-block:: javascript
-
-"core_list":[ 10, 11 ]
-
 :Pair Name: "workload"
 :Description: When our policy is of type WORKLOAD, we need to specify how
   heavy our workload is.
@@ -566,17 +549,6 @@ names and value types are as follows:
 
 "unit", "SCALE_MAX"
 
-:Pair Name: "resource_id"
-:Description: The core to which to apply the power command.
-:Type: integer
-:Values: valid core id for VM or host OS.
-:Required: only POWER instruction
-:Example:
-
-  .. code-block:: javascript
-
-"resource_id": 10
-
 JSON API Examples
 ~
 
@@ -585,12 +557,10 @@ Profile create example:
   .. code-block:: javascript
 
 {"policy": {
-  "name": "ubuntu",
   "command": "create",
   "policy_type": "TIME",
   "busy_hours":[ 17, 18, 19, 20, 21, 22, 23 ],
-  "quiet_hours":[ 2, 3, 4, 5, 6 ],
-  "core_list":[ 11 ]
+  "quiet_hours":[ 2, 3, 4, 5, 6 ]
 }}
 
 Profile destroy example:
@@ -598,8 +568,7 @@ Profile destroy example:
   .. code-block:: javascript
 
 {"policy": {
-  "name": "ubuntu",
-  "command": "destroy",
+  "command": "destroy"
 }}
 
 Power command example:
@@ -607,18 +576,16 @@ Power command example:
   .. code-block:: javascript
 
 {"instruction": {
-  "name": "ubuntu",
   "command": "power",
-  "unit": "SCALE_MAX",
-  "resource_id": 10
+  "unit": "SCALE_MAX"
 }}
 
 To send a JSON string to the Power Manager application, simply paste the
-example JSON string into a text file and cat it into the fifo:
+example JSON string into a text file and cat it into the proper fifo:
 
   .. code-block:: console
 
-cat file.json >/tmp/powermonitor/fifo
+cat file.json >/tmp/powermonitor/fifo[0..n]
 
 The console of the Power Manager application should indicate the command that
 was just received via the fifo.
diff --git 

[dpdk-dev] [PATCH v3 0/2] examples/client_server_mp: fix port issues

2019-07-09 Thread Stephen Hemminger
The client_server_mp application does not work correctly
Azure/Hyper-V because it does not handle the concept of some
ports being hidden and unavailable.

Stephen Hemminger (2):
  examples/multi_process/client_server_mp: check port validity
  examples/multi_process - fix crash in mp_client with sparse ports

v3 - merge both patches in one series
 use alternative algorithm to check port ownership (N^2)
 because reviewer didn't like direct check.

 .../client_server_mp/mp_client/client.c   | 18 
 .../client_server_mp/mp_server/args.c | 46 +--
 .../client_server_mp/mp_server/args.h |  2 +-
 .../client_server_mp/mp_server/init.c |  7 +--
 4 files changed, 44 insertions(+), 29 deletions(-)

-- 
2.20.1



[dpdk-dev] [PATCH v3 1/2] examples/multi_process/client_server_mp: check port validity

2019-07-09 Thread Stephen Hemminger
From: Stephen Hemminger 

The mp_server incorrectly allows a port mask that included hidden
ports and which later caused either lost packets or failed initialization.

This fixes explicitly checking that each bit in portmask is a
valid port before using it.

The max_ports parameter is no longer necessary, so remove it
from the caller.

Fixes: 5b7ba31148a8 ("ethdev: add port ownership")
Signed-off-by: Stephen Hemminger 
---
 .../client_server_mp/mp_server/args.c | 46 +--
 .../client_server_mp/mp_server/args.h |  2 +-
 .../client_server_mp/mp_server/init.c |  7 +--
 3 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/examples/multi_process/client_server_mp/mp_server/args.c 
b/examples/multi_process/client_server_mp/mp_server/args.c
index b0d8d7665c85..1d52489347df 100644
--- a/examples/multi_process/client_server_mp/mp_server/args.c
+++ b/examples/multi_process/client_server_mp/mp_server/args.c
@@ -10,6 +10,7 @@
 #include 
 
 #include 
+#include 
 #include 
 
 #include "common.h"
@@ -34,6 +35,22 @@ usage(void)
, progname);
 }
 
+/**
+ * Check if port is present in the system
+ * It maybe owned by a device and should not be used.
+ */
+static int
+port_is_present(uint16_t portid)
+{
+   uint16_t id;
+
+   RTE_ETH_FOREACH_DEV(id) {
+   if (id == portid)
+   return 1;
+   }
+   return 0;
+}
+
 /**
  * The ports to be used by the application are passed in
  * the form of a bitmask. This function parses the bitmask
@@ -41,31 +58,32 @@ usage(void)
  * array variable
  */
 static int
-parse_portmask(uint8_t max_ports, const char *portmask)
+parse_portmask(const char *portmask)
 {
char *end = NULL;
unsigned long pm;
-   uint16_t count = 0;
+   uint16_t count;
 
if (portmask == NULL || *portmask == '\0')
return -1;
 
/* convert parameter to a number and verify */
pm = strtoul(portmask, &end, 16);
-   if (end == NULL || *end != '\0' || pm == 0)
+   if (end == NULL || *end != '\0' || pm > UINT16_MAX || pm == 0)
return -1;
 
/* loop through bits of the mask and mark ports */
-   while (pm != 0){
-   if (pm & 0x01){ /* bit is set in mask, use port */
-   if (count >= max_ports)
-   printf("WARNING: requested port %u not present"
-   " - ignoring\n", (unsigned)count);
-   else
-   ports->id[ports->num_ports++] = count;
+   for (count = 0; pm != 0; pm >>= 1, ++count) {
+   if ((pm & 0x1) == 0)
+   continue;
+
+   if (!port_is_present(count)) {
+   printf("WARNING: requested port %u not present - 
ignoring\n",
+   count);
+   continue;
}
-   pm = (pm >> 1);
-   count++;
+
+   ports->id[ports->num_ports++] = count;
}
 
return 0;
@@ -99,7 +117,7 @@ parse_num_clients(const char *clients)
  * on error.
  */
 int
-parse_app_args(uint16_t max_ports, int argc, char *argv[])
+parse_app_args(int argc, char *argv[])
 {
int option_index, opt;
char **argvopt = argv;
@@ -112,7 +130,7 @@ parse_app_args(uint16_t max_ports, int argc, char *argv[])
&option_index)) != EOF){
switch (opt){
case 'p':
-   if (parse_portmask(max_ports, optarg) != 0){
+   if (parse_portmask(optarg) != 0){
usage();
return -1;
}
diff --git a/examples/multi_process/client_server_mp/mp_server/args.h 
b/examples/multi_process/client_server_mp/mp_server/args.h
index 79c190a33a37..52c8cc86e6f0 100644
--- a/examples/multi_process/client_server_mp/mp_server/args.h
+++ b/examples/multi_process/client_server_mp/mp_server/args.h
@@ -5,6 +5,6 @@
 #ifndef _ARGS_H_
 #define _ARGS_H_
 
-int parse_app_args(uint16_t max_ports, int argc, char *argv[]);
+int parse_app_args(int argc, char *argv[]);
 
 #endif /* ifndef _ARGS_H_ */
diff --git a/examples/multi_process/client_server_mp/mp_server/init.c 
b/examples/multi_process/client_server_mp/mp_server/init.c
index 3af5dc6994bf..1b0569937b51 100644
--- a/examples/multi_process/client_server_mp/mp_server/init.c
+++ b/examples/multi_process/client_server_mp/mp_server/init.c
@@ -238,7 +238,7 @@ init(int argc, char *argv[])
 {
int retval;
const struct rte_memzone *mz;
-   uint16_t i, total_ports;
+   uint16_t i;
 
/* init EAL, parsing EAL args */
retval = rte_eal_init(argc, argv);
@@ -247,9 +247,6 @@ init(int argc, char *argv[])
argc -= retval;
argv += retval;
 
-   /* get total number of ports */
-   total_ports = rte_eth_

[dpdk-dev] [PATCH v3 2/2] examples/multi_process - fix crash in mp_client with sparse ports

2019-07-09 Thread Stephen Hemminger
From: Stephen Hemminger 

The mp_client crashes if run on Azure or any system where ethdev
ports are owned. In that case, the tx_buffer and tx_stats for the
real port were initialized correctly, but the wrong port was used.

For example if the server has Ports 3 and 5. Then calling
rte_eth_tx_buffer_flush on any other buffer will dereference null
because the tx buffer for that port was not allocated.

Also:
   - the flush code is common enough that it should not be marked
 unlikely
   - combine conditions to reduce indentation
   - avoid unnecessary if() if sent is zero.

Fixes: e2366e74e029 ("examples: use buffered Tx")
Signed-off-by: Stephen Hemminger 
---
 .../client_server_mp/mp_client/client.c| 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/examples/multi_process/client_server_mp/mp_client/client.c 
b/examples/multi_process/client_server_mp/mp_client/client.c
index c23dd3f378f7..361d90b54b2d 100644
--- a/examples/multi_process/client_server_mp/mp_client/client.c
+++ b/examples/multi_process/client_server_mp/mp_client/client.c
@@ -246,19 +246,19 @@ main(int argc, char *argv[])
 
for (;;) {
uint16_t i, rx_pkts;
-   uint16_t port;
 
rx_pkts = rte_ring_dequeue_burst(rx_ring, pkts,
PKT_READ_SIZE, NULL);
 
-   if (unlikely(rx_pkts == 0)){
-   if (need_flush)
-   for (port = 0; port < ports->num_ports; port++) 
{
-   sent = 
rte_eth_tx_buffer_flush(ports->id[port], client_id,
-   tx_buffer[port]);
-   if (unlikely(sent))
-   tx_stats->tx[port] += sent;
-   }
+   if (rx_pkts == 0 && need_flush) {
+   for (i = 0; i < ports->num_ports; i++) {
+   uint16_t port = ports->id[i];
+
+   sent = rte_eth_tx_buffer_flush(port,
+  client_id,
+  tx_buffer[port]);
+   tx_stats->tx[port] += sent;
+   }
need_flush = 0;
continue;
}
-- 
2.20.1



Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Thomas Monjalon
09/07/2019 17:02, Burakov, Anatoly:
> When IOMMU is fully enabled (iommu=on at boot time), igb_uio will simply 
> not work. VFIO will work, whichever address mode you use.
> 
> When IOMMU is in pass-through mode (iommu=pt at boot time), both igb_uio 
> and VFIO will work, although igb_uio will only support IOVA as PA mode. 
> Both modes will enable IOMMU, and both can run in IOVA as PA mode 
> without losing that protection.
> 
> It's only when IOMMU is off, igb_uio will not engage IOMMU, and VFIO 
> will only work in no-IOMMU mode (thus not engaging IOMMU either), and 
> only then you lack the IOMMU protection.

Could we try to make IOMMU status clear in DPDK logs?
Then we could check the kernel drivers loaded and give
a compatibility status for each of them as debug logs.




Re: [dpdk-dev] [PATCH v5] power: add fifo per core for JSON interface

2019-07-09 Thread Burakov, Anatoly

On 09-Jul-19 4:07 PM, David Hunt wrote:

From: Marcin Hajkowski 

This patch implements a separate FIFO for each cpu core to improve the
previous functionality where anyone with access to the FIFO could affect
any core on the system. By using appropriate permissions, fifo interfaces
can be configured to only affect the particular cores.

Because each FIFO is per core, the following fields have been removed
from the command JSON format: core_list, resource_id, name.

Signed-off-by: Lukasz Krakowiak 
Signed-off-by: Lukasz Gosiewski 
Signed-off-by: Marcin Hajkowski 
Tested-by: David Hunt 

---
v2:
* updated handling vm_name (use proper buff size)
* rebase to master changes

v3:
* improvement to coding style

v4:
* rebase to tip of master

v5:
* merged docs into same patch as the code, as per mailing list policy
* made changes out of review by Anatoly.
---





-   if (setup_host_channel_info(&chan_info, 0) < 0) {
-   rte_free(chan_info);
-   return 0;
+   ret = fifo_path(socket_path, sizeof(socket_path), i);
+   if (ret < 0)
+   goto error;
+
+   ret = mkfifo(socket_path, 0660);
+   RTE_LOG(DEBUG, CHANNEL_MANAGER, "TRY CREATE fifo '%s'\n",
+   socket_path);
+   if ((errno != EEXIST) && (ret < 0)) {
+   RTE_LOG(ERR, CHANNEL_MANAGER, "Cannot create fifo '%s' 
error: "
+   "%s\n", socket_path, strerror(errno));
+   goto error;
+   }
+   chan_info = rte_malloc(NULL, sizeof(*chan_info), 0);
+   if (chan_info == NULL) {
+   RTE_LOG(ERR, CHANNEL_MANAGER, "Error allocating memory for 
"
+   "channel '%s'\n", socket_path);
+   goto error;
+   }
+   chan_infos[i] = chan_info;
+   rte_strlcpy(chan_info->channel_path, socket_path,
+   sizeof(chan_info->channel_path));
+
+   if (setup_host_channel_info(&chan_info, i) < 0) {
+   rte_free(chan_info);
+   chan_infos[i] = NULL;
+   goto error;
+   }
+   num_channels_enabled++;
}
-   num_channels_enabled++;
  
  	return num_channels_enabled;

+error:
+   /* Clean up the channels opened before we hit an error. */
+   for (i = 0; i < RTE_MAX_LCORE; i++) {


You're going up to RTE_MAX_LCORE here, but up to ci->core_count in the 
original loop. Is that intentional?


Other than that,

Acked-by: Anatoly Burakov 

--
Thanks,
Anatoly


Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Burakov, Anatoly

On 09-Jul-19 4:12 PM, Thomas Monjalon wrote:

09/07/2019 17:02, Burakov, Anatoly:

When IOMMU is fully enabled (iommu=on at boot time), igb_uio will simply
not work. VFIO will work, whichever address mode you use.

When IOMMU is in pass-through mode (iommu=pt at boot time), both igb_uio
and VFIO will work, although igb_uio will only support IOVA as PA mode.
Both modes will enable IOMMU, and both can run in IOVA as PA mode
without losing that protection.

It's only when IOMMU is off, igb_uio will not engage IOMMU, and VFIO
will only work in no-IOMMU mode (thus not engaging IOMMU either), and
only then you lack the IOMMU protection.


Could we try to make IOMMU status clear in DPDK logs?
Then we could check the kernel drivers loaded and give
a compatibility status for each of them as debug logs.



I don't think there is a way to know IOMMU status from DPDK. It's a 
property of the system. We can kinda-sorta check IOMMU status if we have 
VFIO driver (there's a API to check for vfio_noiommu i think), and we do 
log it into debug output, but there is no such facility for igb_uio - we 
cannot know if it does or does not engage the IOMMU. (not unless we grep 
dmesg or something)


--
Thanks,
Anatoly


[dpdk-dev] [PATCH v6] power: add fifo per core for JSON interface

2019-07-09 Thread David Hunt
From: Marcin Hajkowski 

This patch implements a separate FIFO for each cpu core to improve the
previous functionality where anyone with access to the FIFO could affect
any core on the system. By using appropriate permissions, fifo interfaces
can be configured to only affect the particular cores.

Because each FIFO is per core, the following fields have been removed
from the command JSON format: core_list, resource_id, name.

Signed-off-by: Lukasz Krakowiak 
Signed-off-by: Lukasz Gosiewski 
Signed-off-by: Marcin Hajkowski 
Tested-by: David Hunt 
Acked-by: Anatoly Burakov 

---
v2:
* updated handling vm_name (use proper buff size)
* rebase to master changes

v3:
* improvement to coding style

v4:
* rebase to tip of master

v5:
* merged docs into same patch as the code, as per mailing list policy
* made changes out of review by Anatoly.

v6:
* Slight optimisation in the range of a for loop.
---
 .../sample_app_ug/vm_power_management.rst | 61 +++-
 examples/vm_power_manager/channel_manager.c   | 90 +-
 examples/vm_power_manager/channel_manager.h   |  7 +-
 examples/vm_power_manager/channel_monitor.c   | 92 +--
 examples/vm_power_manager/main.c  |  2 +-
 5 files changed, 150 insertions(+), 102 deletions(-)

diff --git a/doc/guides/sample_app_ug/vm_power_management.rst 
b/doc/guides/sample_app_ug/vm_power_management.rst
index 5d9a26172..0835e 100644
--- a/doc/guides/sample_app_ug/vm_power_management.rst
+++ b/doc/guides/sample_app_ug/vm_power_management.rst
@@ -380,9 +380,16 @@ parsing functionality will not be present in the app.
 
 Sending a command or policy to the power manager application is achieved by
 simply opening a fifo file, writing a JSON string to that fifo, and closing
-the file.
+the file. In actual implementation every core has own dedicated fifo[0..n],
+where n is number of the last available core.
+Having a dedicated fifo file per core allows using standard filesystem 
permissions
+to ensure a given container can only write JSON commands into fifos it is 
allowed
+to use.
 
-The fifo is at /tmp/powermonitor/fifo
+The fifo is at /tmp/powermonitor/fifo[0..n]
+
+For example all cmds put to the /tmp/powermonitor/fifo7, will have
+effect only on CPU[7].
 
 The JSON string can be a policy or instruction, and takes the following
 format:
@@ -405,19 +412,6 @@ arrays, etc. Examples of policies follow later in this 
document. The allowed
 names and value types are as follows:
 
 
-:Pair Name: "name"
-:Description: Name of the VM or Host. Allows the parser to associate the
-  policy with the relevant VM or Host OS.
-:Type: string
-:Values: any valid string
-:Required: yes
-:Example:
-
-.. code-block:: javascript
-
-  "name", "ubuntu2"
-
-
 :Pair Name: "command"
 :Description: The type of packet we're sending to the power manager. We can be
   creating or destroying a policy, or sending a direct command to adjust
@@ -509,17 +503,6 @@ names and value types are as follows:
 
 "max_packet_thresh": 50
 
-:Pair Name: "core_list"
-:Description: The cores to which to apply the policy.
-:Type: array of integers
-:Values: array with list of virtual CPUs.
-:Required: only policy CREATE/DESTROY
-:Example:
-
-  .. code-block:: javascript
-
-"core_list":[ 10, 11 ]
-
 :Pair Name: "workload"
 :Description: When our policy is of type WORKLOAD, we need to specify how
   heavy our workload is.
@@ -566,17 +549,6 @@ names and value types are as follows:
 
 "unit", "SCALE_MAX"
 
-:Pair Name: "resource_id"
-:Description: The core to which to apply the power command.
-:Type: integer
-:Values: valid core id for VM or host OS.
-:Required: only POWER instruction
-:Example:
-
-  .. code-block:: javascript
-
-"resource_id": 10
-
 JSON API Examples
 ~
 
@@ -585,12 +557,10 @@ Profile create example:
   .. code-block:: javascript
 
 {"policy": {
-  "name": "ubuntu",
   "command": "create",
   "policy_type": "TIME",
   "busy_hours":[ 17, 18, 19, 20, 21, 22, 23 ],
-  "quiet_hours":[ 2, 3, 4, 5, 6 ],
-  "core_list":[ 11 ]
+  "quiet_hours":[ 2, 3, 4, 5, 6 ]
 }}
 
 Profile destroy example:
@@ -598,8 +568,7 @@ Profile destroy example:
   .. code-block:: javascript
 
 {"policy": {
-  "name": "ubuntu",
-  "command": "destroy",
+  "command": "destroy"
 }}
 
 Power command example:
@@ -607,18 +576,16 @@ Power command example:
   .. code-block:: javascript
 
 {"instruction": {
-  "name": "ubuntu",
   "command": "power",
-  "unit": "SCALE_MAX",
-  "resource_id": 10
+  "unit": "SCALE_MAX"
 }}
 
 To send a JSON string to the Power Manager application, simply paste the
-example JSON string into a text file and cat it into the fifo:
+example JSON string into a text file and cat it into the proper fifo:
 
   .. code-block:: console
 
-cat file.json >/tmp/powermonitor/fifo
+cat file.json >/tmp/powermonitor/fifo[0..n]
 
 The console of the Power Manager applica

Re: [dpdk-dev] [PATCH v5] power: add fifo per core for JSON interface

2019-07-09 Thread Hunt, David

Hi Anatoly,

On 09/07/2019 16:12, Burakov, Anatoly wrote:

On 09-Jul-19 4:07 PM, David Hunt wrote:

From: Marcin Hajkowski 

This patch implements a separate FIFO for each cpu core to improve the
previous functionality where anyone with access to the FIFO could affect
any core on the system. By using appropriate permissions, fifo 
interfaces

can be configured to only affect the particular cores.

Because each FIFO is per core, the following fields have been removed
from the command JSON format: core_list, resource_id, name.

Signed-off-by: Lukasz Krakowiak 
Signed-off-by: Lukasz Gosiewski 
Signed-off-by: Marcin Hajkowski 
Tested-by: David Hunt 

---
v2:
* updated handling vm_name (use proper buff size)
* rebase to master changes

v3:
* improvement to coding style

v4:
* rebase to tip of master

v5:
* merged docs into same patch as the code, as per mailing list policy
* made changes out of review by Anatoly.
---





-    if (setup_host_channel_info(&chan_info, 0) < 0) {
-    rte_free(chan_info);
-    return 0;
+    ret = fifo_path(socket_path, sizeof(socket_path), i);
+    if (ret < 0)
+    goto error;
+
+    ret = mkfifo(socket_path, 0660);
+    RTE_LOG(DEBUG, CHANNEL_MANAGER, "TRY CREATE fifo '%s'\n",
+    socket_path);
+    if ((errno != EEXIST) && (ret < 0)) {
+    RTE_LOG(ERR, CHANNEL_MANAGER, "Cannot create fifo '%s' 
error: "

+    "%s\n", socket_path, strerror(errno));
+    goto error;
+    }
+    chan_info = rte_malloc(NULL, sizeof(*chan_info), 0);
+    if (chan_info == NULL) {
+    RTE_LOG(ERR, CHANNEL_MANAGER, "Error allocating memory 
for "

+    "channel '%s'\n", socket_path);
+    goto error;
+    }
+    chan_infos[i] = chan_info;
+    rte_strlcpy(chan_info->channel_path, socket_path,
+    sizeof(chan_info->channel_path));
+
+    if (setup_host_channel_info(&chan_info, i) < 0) {
+    rte_free(chan_info);
+    chan_infos[i] = NULL;
+    goto error;
+    }
+    num_channels_enabled++;
  }
-    num_channels_enabled++;
    return num_channels_enabled;
+error:
+    /* Clean up the channels opened before we hit an error. */
+    for (i = 0; i < RTE_MAX_LCORE; i++) {


You're going up to RTE_MAX_LCORE here, but up to ci->core_count in the 
original loop. Is that intentional?


Other than that,

Acked-by: Anatoly Burakov 



Just to be totally clean, I've fixed the loop limit, and respun as v6. :)

Thanks for the review.

Rgds,

Dave.





Re: [dpdk-dev] [PATCH v6] power: add fifo per core for JSON interface

2019-07-09 Thread Hunt, David

Hi Thomas,

   Fyi, I am unable mark v4 as superseded in patchwork 
(http://patches.dpdk.org/project/dpdk/list/?series=4997), although I've 
done that with v5, v6 is the latest version.


Rgds,
Dave.


On 09/07/2019 16:21, David Hunt wrote:

From: Marcin Hajkowski 

This patch implements a separate FIFO for each cpu core to improve the
previous functionality where anyone with access to the FIFO could affect
any core on the system. By using appropriate permissions, fifo interfaces
can be configured to only affect the particular cores.

Because each FIFO is per core, the following fields have been removed
from the command JSON format: core_list, resource_id, name.

Signed-off-by: Lukasz Krakowiak 
Signed-off-by: Lukasz Gosiewski 
Signed-off-by: Marcin Hajkowski 
Tested-by: David Hunt 
Acked-by: Anatoly Burakov 

---
v2:
* updated handling vm_name (use proper buff size)
* rebase to master changes

v3:
* improvement to coding style

v4:
* rebase to tip of master

v5:
* merged docs into same patch as the code, as per mailing list policy
* made changes out of review by Anatoly.

v6:
* Slight optimisation in the range of a for loop.
---
  .../sample_app_ug/vm_power_management.rst | 61 +++-
  examples/vm_power_manager/channel_manager.c   | 90 +-
  examples/vm_power_manager/channel_manager.h   |  7 +-
  examples/vm_power_manager/channel_monitor.c   | 92 +--
  examples/vm_power_manager/main.c  |  2 +-
  5 files changed, 150 insertions(+), 102 deletions(-)

diff --git a/doc/guides/sample_app_ug/vm_power_management.rst 
b/doc/guides/sample_app_ug/vm_power_management.rst
index 5d9a26172..0835e 100644
--- a/doc/guides/sample_app_ug/vm_power_management.rst
+++ b/doc/guides/sample_app_ug/vm_power_management.rst
@@ -380,9 +380,16 @@ parsing functionality will not be present in the app.
  
  Sending a command or policy to the power manager application is achieved by

  simply opening a fifo file, writing a JSON string to that fifo, and closing
-the file.
+the file. In actual implementation every core has own dedicated fifo[0..n],
+where n is number of the last available core.
+Having a dedicated fifo file per core allows using standard filesystem 
permissions
+to ensure a given container can only write JSON commands into fifos it is 
allowed
+to use.
  
-The fifo is at /tmp/powermonitor/fifo

+The fifo is at /tmp/powermonitor/fifo[0..n]
+
+For example all cmds put to the /tmp/powermonitor/fifo7, will have
+effect only on CPU[7].
  
  The JSON string can be a policy or instruction, and takes the following

  format:
@@ -405,19 +412,6 @@ arrays, etc. Examples of policies follow later in this 
document. The allowed
  names and value types are as follows:
  
  
-:Pair Name: "name"

-:Description: Name of the VM or Host. Allows the parser to associate the
-  policy with the relevant VM or Host OS.
-:Type: string
-:Values: any valid string
-:Required: yes
-:Example:
-
-.. code-block:: javascript
-
-  "name", "ubuntu2"
-
-
  :Pair Name: "command"
  :Description: The type of packet we're sending to the power manager. We can be
creating or destroying a policy, or sending a direct command to adjust
@@ -509,17 +503,6 @@ names and value types are as follows:
  
  "max_packet_thresh": 50
  
-:Pair Name: "core_list"

-:Description: The cores to which to apply the policy.
-:Type: array of integers
-:Values: array with list of virtual CPUs.
-:Required: only policy CREATE/DESTROY
-:Example:
-
-  .. code-block:: javascript
-
-"core_list":[ 10, 11 ]
-
  :Pair Name: "workload"
  :Description: When our policy is of type WORKLOAD, we need to specify how
heavy our workload is.
@@ -566,17 +549,6 @@ names and value types are as follows:
  
  "unit", "SCALE_MAX"
  
-:Pair Name: "resource_id"

-:Description: The core to which to apply the power command.
-:Type: integer
-:Values: valid core id for VM or host OS.
-:Required: only POWER instruction
-:Example:
-
-  .. code-block:: javascript
-
-"resource_id": 10
-
  JSON API Examples
  ~
  
@@ -585,12 +557,10 @@ Profile create example:

.. code-block:: javascript
  
  {"policy": {

-  "name": "ubuntu",
"command": "create",
"policy_type": "TIME",
"busy_hours":[ 17, 18, 19, 20, 21, 22, 23 ],
-  "quiet_hours":[ 2, 3, 4, 5, 6 ],
-  "core_list":[ 11 ]
+  "quiet_hours":[ 2, 3, 4, 5, 6 ]
  }}
  
  Profile destroy example:

@@ -598,8 +568,7 @@ Profile destroy example:
.. code-block:: javascript
  
  {"policy": {

-  "name": "ubuntu",
-  "command": "destroy",
+  "command": "destroy"
  }}
  
  Power command example:

@@ -607,18 +576,16 @@ Power command example:
.. code-block:: javascript
  
  {"instruction": {

-  "name": "ubuntu",
"command": "power",
-  "unit": "SCALE_MAX",
-  "resource_id": 10
+  "unit": "SCALE_MAX"
  }}
  
  To send a JSON string to the Power Manager application, simply pas

Re: [dpdk-dev] [PATCH v6] power: add fifo per core for JSON interface

2019-07-09 Thread Thomas Monjalon
09/07/2019 17:27, Hunt, David:
> Hi Thomas,
> 
> Fyi, I am unable mark v4 as superseded in patchwork 
> (http://patches.dpdk.org/project/dpdk/list/?series=4997), although I've 
> done that with v5, v6 is the latest version.

OK, done, thanks





[dpdk-dev] [v4] net/e1000: i219 unit hang issue fix on reset/close

2019-07-09 Thread Xiao Zhang
Unit hang may occur if multiple descriptors are available in the rings
during reset or close. This state can be detected by configure status
by bit 8 in register. If the bit is set and there are pending descriptors
in one of the rings, we must flush them before reset or close.

Signed-off-by: Xiao Zhang 
---
 drivers/net/e1000/e1000_ethdev.h |   4 ++
 drivers/net/e1000/igb_ethdev.c   |   4 ++
 drivers/net/e1000/igb_rxtx.c | 105 +++
 3 files changed, 113 insertions(+)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 67acb73..349144a 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -35,6 +35,9 @@
 #define IGB_MAX_RX_QUEUE_NUM   8
 #define IGB_MAX_RX_QUEUE_NUM_82576 16
 
+#define E1000_I219_MAX_RX_QUEUE_NUM2
+#define E1000_I219_MAX_TX_QUEUE_NUM2
+
 #define E1000_SYN_FILTER_ENABLE0x0001 /* syn filter enable field */
 #define E1000_SYN_FILTER_QUEUE 0x000E /* syn filter queue field */
 #define E1000_SYN_FILTER_QUEUE_SHIFT   1  /* syn filter queue field */
@@ -522,5 +525,6 @@ int igb_action_rss_same(const struct rte_flow_action_rss 
*comp,
 int igb_config_rss_filter(struct rte_eth_dev *dev,
struct igb_rte_flow_rss_conf *conf,
bool add);
+void igb_flush_desc_rings(struct rte_eth_dev *dev);
 
 #endif /* _E1000_ETHDEV_H_ */
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 3ee28cf..845101b 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -1589,6 +1589,10 @@ eth_igb_close(struct rte_eth_dev *dev)
eth_igb_stop(dev);
adapter->stopped = 1;
 
+   /* Flush desc rings for i219 */
+   if (hw->mac.type >= e1000_pch_spt)
+   igb_flush_desc_rings(dev);
+
e1000_phy_hw_reset(hw);
igb_release_manageability(hw);
igb_hw_control_release(hw);
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index c5606de..cad28f3 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -63,6 +64,10 @@
 #define IGB_TX_OFFLOAD_NOTSUP_MASK \
(PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
 
+/* PCI offset for querying descriptor ring status*/
+#define PCICFG_DESC_RING_STATUS   0xE4
+#define FLUSH_DESC_REQUIRED   0x100
+
 /**
  * Structure associated with each descriptor of the RX ring of a RX queue.
  */
@@ -2962,3 +2967,103 @@ igb_config_rss_filter(struct rte_eth_dev *dev,
 
return 0;
 }
+
+static void e1000_flush_tx_ring(struct rte_eth_dev *dev)
+{
+   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   volatile union e1000_adv_tx_desc *tx_desc;
+   uint32_t tdt, tctl, txd_lower = E1000_TXD_CMD_IFCS;
+   uint16_t size = 512;
+   struct igb_tx_queue *txq;
+   int i;
+
+   if (dev->data->tx_queues == NULL)
+   return;
+   tctl = E1000_READ_REG(hw, E1000_TCTL);
+   E1000_WRITE_REG(hw, E1000_TCTL, tctl | E1000_TCTL_EN);
+   for (i = 0; i < dev->data->nb_tx_queues &&
+   i < E1000_I219_MAX_TX_QUEUE_NUM; i++) {
+   txq = dev->data->tx_queues[i];
+   tdt = E1000_READ_REG(hw, E1000_TDT(i));
+   if (tdt != txq->tx_tail)
+   return;
+   tx_desc = &txq->tx_ring[txq->tx_tail];
+   tx_desc->read.buffer_addr = txq->tx_ring_phys_addr;
+   tx_desc->read.cmd_type_len = rte_cpu_to_le_32(txd_lower | size);
+   tx_desc->read.olinfo_status = 0;
+
+   rte_wmb();
+   txq->tx_tail++;
+   if (txq->tx_tail == txq->nb_tx_desc)
+   txq->tx_tail = 0;
+   rte_io_wmb();
+   E1000_WRITE_REG(hw, E1000_TDT(i), txq->tx_tail);
+   usec_delay(250);
+   }
+}
+
+static void e1000_flush_rx_ring(struct rte_eth_dev *dev)
+{
+   uint32_t rctl, rxdctl;
+   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   int i;
+
+   rctl = E1000_READ_REG(hw, E1000_RCTL);
+   E1000_WRITE_REG(hw, E1000_RCTL, rctl & ~E1000_RCTL_EN);
+   E1000_WRITE_FLUSH(hw);
+   usec_delay(150);
+
+   for (i = 0; i < dev->data->nb_rx_queues &&
+   i < E1000_I219_MAX_RX_QUEUE_NUM; i++) {
+   rxdctl = E1000_READ_REG(hw, E1000_RXDCTL(i));
+   /* zero the lower 14 bits (prefetch and host thresholds) */
+   rxdctl &= 0xc000;
+
+   /* update thresholds: prefetch threshold to 31,
+* host threshold to 1 and make sure the granularity
+* is "descriptors" and not "cache lines"
+*/
+   rxdctl |= (0x1F | (1UL << 8) | E1000_RXDCTL_THRESH_UNIT_DESC);

Re: [dpdk-dev] [v3] net/e1000: i219 unit hang issue fix on reset/close

2019-07-09 Thread Zhang, Xiao

The tail index was missing in this patch, and a new patch is sent out.

-Original Message-
From: Anand H. Krishnan [mailto:anandhkrish...@gmail.com] 
Sent: Tuesday, July 9, 2019 2:37 PM
To: Zhang, Xiao 
Cc: dev@dpdk.org; Zhao1, Wei 
Subject: Re: [dpdk-dev] [v3] net/e1000: i219 unit hang issue fix on reset/close

Comments inline.

On Tue, Jul 9, 2019 at 8:58 AM Xiao Zhang  wrote:
>
> Unit hang may occur if multiple descriptors are available in the rings 
> during reset or close. This state can be detected by configure status 
> by bit 8 in register. If the bit is set and there are pending 
> descriptors in one of the rings, we must flush them before reset or close.
>
> Signed-off-by: Xiao Zhang 
> ---
>  drivers/net/e1000/e1000_ethdev.h |   4 ++
>  drivers/net/e1000/igb_ethdev.c   |   4 ++
>  drivers/net/e1000/igb_rxtx.c | 105 
> +++
>  3 files changed, 113 insertions(+)
>
> diff --git a/drivers/net/e1000/e1000_ethdev.h 
> b/drivers/net/e1000/e1000_ethdev.h
> index 67acb73..349144a 100644
> --- a/drivers/net/e1000/e1000_ethdev.h
> +++ b/drivers/net/e1000/e1000_ethdev.h
> @@ -35,6 +35,9 @@
>  #define IGB_MAX_RX_QUEUE_NUM   8
>  #define IGB_MAX_RX_QUEUE_NUM_82576 16
>
> +#define E1000_I219_MAX_RX_QUEUE_NUM2
> +#define E1000_I219_MAX_TX_QUEUE_NUM2
> +
>  #define E1000_SYN_FILTER_ENABLE0x0001 /* syn filter enable field 
> */
>  #define E1000_SYN_FILTER_QUEUE 0x000E /* syn filter queue field 
> */
>  #define E1000_SYN_FILTER_QUEUE_SHIFT   1  /* syn filter queue field 
> */
> @@ -522,5 +525,6 @@ int igb_action_rss_same(const struct 
> rte_flow_action_rss *comp,  int igb_config_rss_filter(struct rte_eth_dev *dev,
> struct igb_rte_flow_rss_conf *conf,
> bool add);
> +void igb_flush_desc_rings(struct rte_eth_dev *dev);
>
>  #endif /* _E1000_ETHDEV_H_ */
> diff --git a/drivers/net/e1000/igb_ethdev.c 
> b/drivers/net/e1000/igb_ethdev.c index 3ee28cf..845101b 100644
> --- a/drivers/net/e1000/igb_ethdev.c
> +++ b/drivers/net/e1000/igb_ethdev.c
> @@ -1589,6 +1589,10 @@ eth_igb_close(struct rte_eth_dev *dev)
> eth_igb_stop(dev);
> adapter->stopped = 1;
>
> +   /* Flush desc rings for i219 */
> +   if (hw->mac.type >= e1000_pch_spt)
> +   igb_flush_desc_rings(dev);
> +
> e1000_phy_hw_reset(hw);
> igb_release_manageability(hw);
> igb_hw_control_release(hw);
> diff --git a/drivers/net/e1000/igb_rxtx.c 
> b/drivers/net/e1000/igb_rxtx.c index c5606de..48e1c1e 100644
> --- a/drivers/net/e1000/igb_rxtx.c
> +++ b/drivers/net/e1000/igb_rxtx.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -63,6 +64,10 @@
>  #define IGB_TX_OFFLOAD_NOTSUP_MASK \
> (PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
>
> +/* PCI offset for querying descriptor ring status*/
> +#define PCICFG_DESC_RING_STATUS   0xE4
> +#define FLUSH_DESC_REQUIRED   0x100
> +
>  /**
>   * Structure associated with each descriptor of the RX ring of a RX queue.
>   */
> @@ -2962,3 +2967,103 @@ igb_config_rss_filter(struct rte_eth_dev *dev,
>
> return 0;
>  }
> +
> +static void e1000_flush_tx_ring(struct rte_eth_dev *dev) {
> +   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +   volatile union e1000_adv_tx_desc *tx_desc;
> +   uint32_t tdt, tctl, txd_lower = E1000_TXD_CMD_IFCS;
> +   uint16_t size = 512;
> +   struct igb_tx_queue *txq;
> +   int i;
> +
> +   if (dev->data->tx_queues == NULL)
> +   return;
> +   tctl = E1000_READ_REG(hw, E1000_TCTL);
> +   E1000_WRITE_REG(hw, E1000_TCTL, tctl | E1000_TCTL_EN);
> +   for (i = 0; i < dev->data->nb_tx_queues &&
> +   i < E1000_I219_MAX_TX_QUEUE_NUM; i++) {
> +   txq = dev->data->tx_queues[i];
> +   tdt = E1000_READ_REG(hw, E1000_TDT(i));
> +   if (tdt != txq->tx_tail)
> +   return;
> +   tx_desc = txq->tx_ring;

This doesn't seem to be the descriptor in tail. Are you sure this is what the 
original patch does?


Thanks,
Anand



> +   tx_desc->read.buffer_addr = txq->tx_ring_phys_addr;
> +   tx_desc->read.cmd_type_len = rte_cpu_to_le_32(txd_lower | 
> size);
> +   tx_desc->read.olinfo_status = 0;
> +
> +   rte_wmb();
> +   txq->tx_tail++;
> +   if (txq->tx_tail == txq->nb_tx_desc)
> +   txq->tx_tail = 0;
> +   rte_io_wmb();
> +   E1000_WRITE_REG(hw, E1000_TDT(i), txq->tx_tail);
> +   usec_delay(250);
> +   }
> +}
> +
> +static void e1000_flush_rx_ring(struct rte_eth_dev *dev) {
> +   uint32_t rctl, rxdctl;
> +   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +   int i;
> +
> + 

Re: [dpdk-dev] [PATCH v2 00/15] Unit tests fixes for CI

2019-07-09 Thread Michael Santana Francisco

On 7/1/19 2:07 PM, Michael Santana Francisco wrote:



On Mon, Jul 1, 2019 at 6:04 PM Aaron Conole  wrote:

- rwlock_autotest and hash_readwrite_lf_autotest are taking a little more
   than 10s,

Occasionally the distributor test times out as well.  I've moved them as
part of a separate patch, that I'll post along with a bigger series to
enable the unit tests under travis.  Michael and I are leaning toward
introducing a new variable called RUN_TESTS which will do the docs and
unit testing since those combined would add quite a bit to the execution
time of each job (and feel free to bike shed the name, since the patches
aren't final).


Seeing how the distributor autotest usually takes less than a second to 
complete, this sounds like a bug.
I don't think I caught this so far.

So I actually ran into the distributor test timing out. I agree with
David in that it is a bug with the test. Looking at the logs that test
normally finishes in less than 1/2 a second, so running to 10 seconds
and timing out is a big jump in run time. I ran into the issue where
it timedout, so I restarted the job and it finished no problem.
The test fails every so often for no good reason and the logs[1] dont
really say much. I speculate that it is waiting for a resource to
become available or in the worse case a deadlock. Seeing that it only
fails every so often and it passes when restarted I don't think it's a
big deal, nevertheless it's worth investing time figuring out what's
wrong

[1] https://api.travis-ci.com/v3/job/212335916/log.txt


I investigated a little bit on this this test. CC'd David Hunt,

I was able to reproduce the problem on v19.08-rc1 with:

`while sudo sh -c "echo 'distributor_autotest' | 
./build/app/test/dpdk-test"; do :; done`


It runs a couple of times fine showing output and showing progress, but 
then at some point after a couple of seconds it just stops - no longer 
getting any output. It just sits there with no further output. I let it 
sit there for a whole minute and nothing happens. So I attach gdb to try 
to figure out what is happening. One thread seems to be stuck on a while 
loop, see lib/librte_distributor/rte_distributor.c:310.


I looked at the assembly code (layout asm, ni) and I saw these four 
lines below (which correspond to the while loop) being executed 
repeatedly and indefinitely. It looks like this thread is waiting for 
the variable bufptr64[0] to change state.


0xa064d0    pause
0xa064d2    mov    0x3840(%rdx),%rax
0xa064d9    test   $0x1,%al
0xa064db    je 0xa064d0 


While the first thread is waiting on bufptr64[0] to change state, there 
is another thread that is also stuck on another while loop on 
lib/librte_distributor/rte_distributor.c:53. It seems that this thread 
is stuck waiting for retptr64 to change state. Corresponding assembly 
being executed indefinitely:


0xa06de0  mov    0x38c0(%r8),%rax
0xa06de7  test   $0x1,%al
0xa06de9  je 0xa06bbd 


0xa06def     nop
0xa06df0  pause
0xa06df2  rdtsc
0xa06df4  mov    %rdx,%r10
0xa06df7  shl    $0x20,%r10
0xa06dfb  mov    %eax,%eax
0xa06dfd  or %r10,%rax
0xa06e00  lea    0x64(%rax),%r10
0xa06e04  jmp    0xa06e12 


0xa06e06  nopw   %cs:0x0(%rax,%rax,1)
0xa06e10  pause
0xa06e12  rdtsc
0xa06e14  shl    $0x20,%rdx
0xa06e18  mov    %eax,%eax
0xa06e1a  or %rdx,%rax
0xa06e1d  cmp    %rax,%r10
0xa06e20  ja 0xa06e10 

0xa06e22  jmp    0xa06de0 




My guess is that these threads are interdependent, so one thread is 
waiting for the other thread to change the state of the control 
variable. I can't say for sure if this is what is happening or why the 
these variables don't change state, so I would like ask someone who is 
more familiar with this particular code to take a look




Yes, we need a variable to control this and select the targets that will do the 
tests and/or build the doc.
About the name, RUN_TESTS is ok for me.

What do you want to make of this variable?
Have it as a simple boolean that enables everything? Or a selector with strings 
like unit-tests+doc+perf-tests?



- librte_table unit test crashes on ipv6 [2],

I guess we're waiting on a patch from Jananee (CC'd)?


Yep.


--
David Marchand





Re: [dpdk-dev] [PATCH] doc: fix PDF build

2019-07-09 Thread Ferruh Yigit
On 7/9/2019 2:11 PM, Thomas Monjalon wrote:
> The command "make doc-guides-pdf" is failing because
> there are more than 1500 lines in the file MAINTAINERS
> which is included in the contributing guide.
> 
> We are facing the issue mentioned in this comment:
> https://github.com/sphinx-doc/sphinx/issues/3099#issuecomment-256440704
> 
> Anyway the file MAINTAINERS is mentioned several times in the guide.
> So the "literalinclude" is removed from the guide to fix the build
> of the PDF.
> 
> Signed-off-by: Thomas Monjalon 

Tested-by: Ferruh Yigit 


Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection

2019-07-09 Thread Jerin Jacob Kollanukkaran
> -Original Message-
> From: Burakov, Anatoly 
> Sent: Tuesday, July 9, 2019 8:07 PM
> To: Jerin Jacob Kollanukkaran ; David Marchand
> 
> Cc: dev ; Thomas Monjalon ; Ben
> Walker 
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> selection
issue.
> >>
> >> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot
> >> work without it, whereas in this case it's more of a "highly
> >> recommended" rather than "need".
> >
> > It is "need" as performance is horrible without it as is per packet SW
> translation.
> > A "need" for DPDK performance perspective.
> 
> Would the driver fail to initialize if it detects running as IOVA as PA?

Yes.
https://git.dpdk.org/dpdk/tree/drivers/net/octeontx2/otx2_ethdev.c#n1191

> Also, some other use cases will also require IOVA as PA while having full
> IOMMU support. An example of this would be systems with limited IOMMU
> width (such as VM's) - even though the IOMMU is technically supported, we
> may not have the necessary address width to run all devices in IOVA as VA
> mode, and would need to fall back to IOVA as PA.
> Since we cannot *require* IOVA as VA in current codebase, any driver that
> expects IOVA as VA to always be enabled will presumably not work.
> 
> >
> > Again, it is not device attribute, it is system attribute.
> 
> If it's a system attribute, why is it a device driver flag then? The system 
> may
> or may not support IOMMU, the device itself probably doesn't care since bus
> address looks the same in both cases, *but the driver
> might* (such as would be in your case - requiring IOVA as VA and disallowing
> IOVA as PA for performance reasons).

Agree.

> 
> Currently (again, disregarding your interpretation of how IOVA as VA works
> and looking at the actual commit history), we always seem to imply that IOVA
> as PA works for all devices, and we use IOVA_AS_VA flag to indicate that the
> device *also* supports IOVA as VA mode.
> 
> But we don't have any way to express a *requirement* for IOVA as VA mode
> - only for IOVA as PA mode. That is the purpose of the new flag. You are
> stating that the IOVA_AS_VA drv flag is an expression of that requirement,
> but that is not reflected in the codebase - our commit history indicates that
> we don't treat IOVA as VA as hard requirement whenever this flag is
> specified (and i would argue that we shouldn't).

No objection to further classify it.

How about the following

1) Change RTE_PCI_DRV_IOVA_AS_VA as RTE_PCI_DRV_IOVA_AS_DC
It is same as existing RTE_PCI_DRV_IOVA_AS_VA. Meaning driver don't care IOVA 
as PA or VA.
2) Introduce RTE_PCI_DRV_NEED_IOVA_AS_VA(Driver needs IOVA as VA)
This would selected for octeontx device "drivers"
3) Change existing driver's "drv_flags" as RTE_PCI_DRV_IOVA_AS_DC if it can work
with PA and VA(literally all exiting drivers which currently has 
RTE_PCI_DRV_IOVA_AS_VA excluding the octeontx drivers)

In pci_device_iova_mode()

if (drv->flags & RTE_PCI_DRV_IOVA_AS_DC)
iova_mode = RTE_IOVA_DC;
else if (drv->flags & RTE_PCI_DRV_NEED_IOVA_AS_VA)
iova_mode = RTE_IOVA_VA;
else
iova_mode = RTE_IOVA_PA;

I can submit the patch if above is OK.

> >>
> >>> # With top of tree, Currently it never runs in IOVA as VA mode.
> >>> That’s a separate problem to fix. Which effect all the devices
> >>> Currently supporting RTE_PCI_DRV_IOVA_AS_VA. Ie even though
> Device
> >>> support RTE_PCI_DRV_IOVA_AS_VA, it is not running With IOMMU
> >>> protection and/or root privilege is required to run DPDK.
> >
> > What's your view on this existing problem?
> 
> My view would be to always run in IOVA as VA by default and only falling
> back to IOVA as PA if there is a need to do that. Yet, it seems that whenever 
> i
> try to bring this up, the response (not necessarily from you, so this is not
> directed at you specifically) seems to be that because of hotplug, we have to
> start in the "safest" (from device support point of
> view) mode - that is, in IOVA as PA. Seeing how, as you claim, some devices
> require IOVA as VA, then IOVA as PA is no longer the "safe"
> default that all devices will support. Perhaps we can use this opportunity to
> finally make IOVA as VA the default :)

I was thinking to use VA as default if system/device/driver supports.
That’s the reason for the original patch to have VA selection in
pci code itself. But it makes sense to move that up.

Not related to this patch, Why hotplug prefers IOVA as PA? Now, If I understand 
it correctly, to accommodate
Hotplug for SPDK the pci_device_iova_mode() made it as DC so that
common code can pick PA.

I have no strong opinion on this, if it help for SPDK then no issue in keeping
default as PA in case of DC.



Re: [dpdk-dev] [PATCH v6] doc: add meson ut info in prog guide

2019-07-09 Thread Michael Santana Francisco
On Mon, Jul 8, 2019 at 4:18 PM Aaron Conole  wrote:
>
> Thomas Monjalon  writes:
>
> > Hi please find some comments below:
> >
> > 06/06/2019 13:59, Hari Kumar Vemula:
> >> +++ b/doc/guides/prog_guide/meson_ut.rst
> >> @@ -0,0 +1,151 @@
> >> +..  SPDX-License-Identifier: BSD-3-Clause
> >> +
> >
> > Useless blank line.
> >
> >> +Copyright(c) 2018-2019 Intel Corporation.
> >> +
> >> +.. _meson_unit_tests:
> >
> > Useless anchor. The doc can be referenced with :doc: links.
> >
> >> +
> >> +Running DPDK Unit Tests with Meson
> >> +==
> >> +
> >> +This section describes how to run testcases with the DPDK meson build 
> >> system.
> >
> > Here and below, "testcases" should be split in two words.
> >
> >> +
> >> +
> >> +Building and running the unit tests
> >> +---
> >> +
> >> +* Create the meson build output folder using the following command::
> >> +
> >> +  $ meson 
> >> +
> >> +* Enter into build output folder, which was created by above command::
> >> +
> >> +  $ cd build
> >
> > Should be the same as above: 
> >
> >> +
> >> +* Compile DPDK using command::
> >> +
> >> +  $ ninja
> >
> > Do we really need to repeat above basic steps?
> > Would be easier to just reference another guide about meson.
> > I think doc/build-sdk-meson.txt should be moved to .rst.
>
> +1
>
> >> +
> >> +The output file of the build will be available in meson build folder. 
> >> After
> >> +a successful ninja command, the binary ``dpdk-test`` is created in
> >> +``build/test/test/``.
> >
> > Again, "build" is an example directory.
> >
> >> +
> >> +* Run the unit testcases::
> >> +
> >> +  $ ninja test
> >> +  # or
> >> +  $ meson test
> >> +
> >> +* To run specific test case via meson::
> >> +
> >> +  $ meson test 
> >> +  # or
> >> +  $ ninja test 
> >
> > Would be worth to mention why meson or ninja can be used.
> >
> >> +
> >> +
> >> +Grouping of testcases
> >> +-
> >> +
> >> +Testcases have been grouped into four different groups based on conditions
> >> +of time duration and performance of the individual testcase.
> >
> > Grouping has changed recently.
> > This part should be updated please.
> >
> >> +
> >> +* Fast tests which can be run in parallel.
> >> +* Fast tests which must run serially.
> >> +* Performance tests.
> >> +* Driver tests.
> >> +* Tests which produce lists of objects as output, and therefore that need
> >> +  manual checking.
> >> +
> >> +Testcases can be run in parallel or non-parallel mode using the 
> >> ``is_parallel`` argument
> >> +of ``test()`` in meson.build
> >> +
> >> +These tests can be run using the argument to ``meson test`` as
> >> +``--suite project_name:label``.
> >> +
> >> +For example::
> >> +
> >> +$ meson test --suite DPDK:fast-tests
> >> +
> >> +
> >> +The project name is optional so the following is equivalent to the 
> >> previous
> >> +command::
> >> +
> >> +
> >> +$ meson test --suite fast-tests
> >> +
> >> +
> >> +Running different test suites
> >> +~
> >> +
> >> +The following commands are some examples of how to run testcases using 
> >> option
> >> +``--suite``:
>
> The following section is a bit misleading.  The limitation on run time
> is per-test.  So 600 seconds in perf-tests is 600 seconds PER TEST.  IE:
> if there are 10 tests, you'll be waiting up to 50 minutes.
>
> >> +* Fast Tests should take less than 10 seconds. The meson command to run 
> >> them
> >> +  is::
> >> +
> >> +  $ meson test --suite DPDK:fast-tests
> >> +
> >> +* Performance Tests should take less than 600 seconds. The meson command 
> >> to
> >> +  run them is::
> >> +
> >> +  $ meson test --suite DPDK:perf-tests
> >> +
> >> +* Driver Tests should take less than 600 seconds. The meson command to run
> >> +  them is::
> >> +
> >> +  $ meson test --suite DPDK:driver-tests
> >> +
> >> +* The meson command to run Dump Tests is::
> >> +
> >> +  $ meson test --suite DPDK:dump-tests
> >
> > Would be simpler to just list the suites.
>
> Even better would be to provide a 1-liner that would dump the suites so
> that new test suites wouldn't need to update the documentation.
Worth mentioning that you can run `meson test --list` to see a list of
all available tests
>
> >> +
> >> +
> >> +Dealing with skipped testcases
> >> +--
> >> +
> >> +Some unit test cases have a dependency on external libraries, driver 
> >> modules
> >> +or config flags, without which the test cases cannot be run. Such test 
> >> cases
> >> +will be reported as skipped if they cannot run. To enable those test 
> >> cases,
> >> +the user should ensure the required dependencies are met.  Below are a few
> >> +possible causes why tests may be skipped and how they may be resolved:
> >> +
> >> +#. Optional external libraries are not found.
> >> +#. Config flags for the dependent library are not enabled.
> >> +#. Dependent driver modules are not installe

[dpdk-dev] [PATCH v2 1/2] vhost: support inflight share memory protocol feature

2019-07-09 Thread JinYu
This patch introduces two new messages VHOST_USER_GET_INFLIGHT_FD
and VHOST_USER_SET_INFLIGHT_FD to support transferring a shared
buffer between qemu and backend.

Firstly, qemu uses VHOST_USER_GET_INFLIGHT_FD to get the
shared buffer from backend. Then qemu should send it back
through VHOST_USER_SET_INFLIGHT_FD each time we start vhost-user.

This shared buffer is used to process inflight I/O when backend
reconnect.

Signed-off-by: LinLi 
Signed-off-by: XunNi 
Signed-off-by: YuZhang 
Signed-off-by: JinYu 
---
v1 - specify the APIs are split-ring only
v2 - fix APIs and judge split or packed
---
 lib/librte_vhost/rte_vhost.h   | 105 +
 lib/librte_vhost/rte_vhost_version.map |   4 +
 lib/librte_vhost/vhost.c   | 163 -
 lib/librte_vhost/vhost.h   |  16 ++
 lib/librte_vhost/vhost_user.c  | 313 +
 lib/librte_vhost/vhost_user.h  |  13 +-
 6 files changed, 612 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 0226b3eff..a17aff876 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -71,6 +71,10 @@ extern "C" {
 #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER 11
 #endif
 
+#ifndef VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+#define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD 12
+#endif
+
 /** Indicate whether protocol features negotiation is supported. */
 #ifndef VHOST_USER_F_PROTOCOL_FEATURES
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
@@ -98,6 +102,41 @@ struct rte_vhost_memory {
struct rte_vhost_mem_region regions[];
 };
 
+struct inflight_desc_split {
+   uint8_t inflight;
+   uint8_t padding[5];
+   uint16_tnext;
+   uint64_tcounter;
+};
+
+struct inflight_info_split {
+   uint64_tfeatures;
+   uint16_tversion;
+   uint16_tdesc_num;
+   uint16_tlast_inflight_io;
+   uint16_tused_idx;
+   struct inflight_desc_split desc[0];
+};
+
+struct resubmit_desc {
+   uint16_t index;
+   uint64_t counter;
+};
+
+struct resubmit_info {
+   struct resubmit_desc*resubmit_list;
+   uint16_tresubmit_num;
+};
+
+struct rte_vhost_ring_inflight_split {
+   union {
+   struct inflight_info_split *inflight_split;
+   /* TODO */
+   };
+
+   struct resubmit_info *resubmit_inflight_split;
+};
+
 struct rte_vhost_vring {
struct vring_desc   *desc;
struct vring_avail  *avail;
@@ -603,6 +642,22 @@ uint16_t rte_vhost_dequeue_burst(int vid, uint16_t 
queue_id,
  */
 int rte_vhost_get_mem_table(int vid, struct rte_vhost_memory **mem);
 
+/**
+ * Get guest inflight vring info, including inflight ring and resubmit list.
+ *
+ * @param vid
+ *  vhost device ID
+ * @param vring_idx
+ *  vring index
+ * @param vring
+ *  the structure to hold the requested inflight vring info
+ * @return
+ *  0 on success, -1 on failure
+ */
+int __rte_experimental
+rte_vhost_get_vhost_ring_inflight_split(int vid, uint16_t vring_idx,
+ struct rte_vhost_ring_inflight_split *vring);
+
 /**
  * Get guest vring info, including the vring address, vring size, etc.
  *
@@ -631,6 +686,56 @@ int rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
  */
 int rte_vhost_vring_call(int vid, uint16_t vring_idx);
 
+/**
+ * set inflight flag for a desc.
+ *
+ * @param vid
+ *  vhost device ID
+ * @param vring_idx
+ *  vring index
+ * @param idx
+ *  inflight entry index
+ * @return
+ *  0 on success, -1 on failure
+ */
+int __rte_experimental
+rte_vhost_set_inflight_desc_split(int vid, uint16_t vring_idx,
+   uint16_t idx);
+
+/**
+ * clear inflight flag for a desc.
+ *
+ * @param vid
+ *  vhost device ID
+ * @param vring_idx
+ *  vring index
+ * @param last_used_idx
+ *  next free used_idx
+ * @param idx
+ *  inflight entry index
+ * @return
+ *  0 on success, -1 on failure
+ */
+int __rte_experimental
+rte_vhost_clr_inflight_desc_split(int vid, uint16_t vring_idx,
+   uint16_t last_used_idx, uint16_t idx);
+
+/**
+ * set last inflight io index.
+ *
+ * @param vid
+ *  vhost device ID
+ * @param vring_idx
+ *  vring index
+ * @param idx
+ *  inflight entry index
+ * @return
+ *  0 on success, -1 on failure
+ */
+int __rte_experimental
+rte_vhost_set_last_inflight_io_split(int vid, uint16_t vring_idx,
+   uint16_t idx);
+
 /**
  * Get vhost RX queue avail count.
  *
diff --git a/lib/librte_vhost/rte_vhost_version.map 
b/lib/librte_vhost/rte_vhost_version.map
index 5f1d4a75c..57a166f4f 100644
--- a/lib/librte_vhost/rte_vhost_version.map
+++ b/lib/librte_vhost/rte_vhost_version.map
@@ -87,4 +87,8 @@ EXPERIMENTAL {
rte_vdpa_relay_vring_used;
rte_vhost_extern_callback_register;
rte_vhost_driver_set_protocol_features;
+   rte_vhost_set_inflight_desc_split;
+   rte_vhost_clr

[dpdk-dev] [PATCH v2 2/2] vhost: Add vhost-user-blk example which support inflight

2019-07-09 Thread JinYu
A vhost-user-blk example that support inflight feature. It uses the
new APIs that introduced in the first patch, so It can show how there
APIs work to support inflight feature.

Signed-off-by: JinYu 
---
V1 - add the case.
---
 examples/vhost_blk/Makefile   |  67 +++
 examples/vhost_blk/blk.c  | 118 ++
 examples/vhost_blk/blk_spec.h |  95 +
 examples/vhost_blk/meson.build|  20 +
 examples/vhost_blk/vhost_blk.c| 589 ++
 examples/vhost_blk/vhost_blk.h|  96 +
 examples/vhost_blk/vhost_blk_compat.c | 193 +
 7 files changed, 1178 insertions(+)
 create mode 100644 examples/vhost_blk/Makefile
 create mode 100644 examples/vhost_blk/blk.c
 create mode 100644 examples/vhost_blk/blk_spec.h
 create mode 100644 examples/vhost_blk/meson.build
 create mode 100644 examples/vhost_blk/vhost_blk.c
 create mode 100644 examples/vhost_blk/vhost_blk.h
 create mode 100644 examples/vhost_blk/vhost_blk_compat.c

diff --git a/examples/vhost_blk/Makefile b/examples/vhost_blk/Makefile
new file mode 100644
index 0..52e9befd8
--- /dev/null
+++ b/examples/vhost_blk/Makefile
@@ -0,0 +1,67 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2010-2017 Intel Corporation
+
+# binary name
+APP = vhost-blk
+
+# all source are stored in SRCS-y
+SRCS-y := blk.c vhost_blk.c vhost_blk_compat.c
+
+# Build using pkg-config variables if possible
+$(shell pkg-config --exists libdpdk)
+ifeq ($(.SHELLSTATUS),0)
+
+all: shared
+.PHONY: shared static
+shared: build/$(APP)-shared
+   ln -sf $(APP)-shared build/$(APP)
+static: build/$(APP)-static
+   ln -sf $(APP)-static build/$(APP)
+
+CFLAGS += -D_FILE_OFFSET_BITS=64
+LDFLAGS += -pthread
+
+PC_FILE := $(shell pkg-config --path libdpdk)
+CFLAGS += -O3 $(shell pkg-config --cflags libdpdk)
+LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk)
+LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk)
+
+build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
+   $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
+
+build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build
+   $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC)
+
+build:
+   @mkdir -p $@
+
+.PHONY: clean
+clean:
+   rm -f build/$(APP) build/$(APP)-static build/$(APP)-shared
+   test -d build && rmdir -p build || true
+
+else # Build using legacy build system
+
+ifeq ($(RTE_SDK),)
+$(error "Please define RTE_SDK environment variable")
+endif
+
+# Default target, detect a build directory, by looking for a path with a 
.config
+RTE_TARGET ?= $(notdir $(abspath $(dir $(firstword $(wildcard 
$(RTE_SDK)/*/.config)
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+ifneq ($(CONFIG_RTE_EXEC_ENV_LINUX),y)
+$(info This application can only operate in a linux environment, \
+please change the definition of the RTE_TARGET environment variable)
+all:
+else
+
+CFLAGS += -D_FILE_OFFSET_BITS=64
+CFLAGS += -O2
+#CFLAGS += $(WERROR_FLAGS)
+
+include $(RTE_SDK)/mk/rte.extapp.mk
+
+endif
+endif
diff --git a/examples/vhost_blk/blk.c b/examples/vhost_blk/blk.c
new file mode 100644
index 0..768792bf9
--- /dev/null
+++ b/examples/vhost_blk/blk.c
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2019 Intel Corporation
+ */
+
+/**
+ * This work is largely based on the "vhost-user-blk" implementation by
+ * SPDK(https://github.com/spdk/spdk).
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vhost_blk.h"
+#include "blk_spec.h"
+
+static void
+vhost_strcpy_pad(void *dst, const char *src, size_t size, int pad)
+{
+   size_t len;
+
+   len = strlen(src);
+   if (len < size) {
+   memcpy(dst, src, len);
+   memset((char *)dst + len, pad, size - len);
+   } else {
+   memcpy(dst, src, size);
+   }
+}
+
+static int
+vhost_bdev_blk_readwrite(struct vhost_block_dev *bdev,
+ struct vhost_blk_task *task,
+ uint64_t lba_512, __rte_unused uint32_t xfer_len)
+{
+   uint32_t i;
+   uint64_t offset;
+   uint32_t nbytes = 0;
+
+   offset = lba_512 * 512;
+
+   for (i = 0; i < task->iovs_cnt; i++) {
+   if (task->dxfer_dir == BLK_DIR_TO_DEV)
+   memcpy(bdev->data + offset, task->iovs[i].iov_base,
+  task->iovs[i].iov_len);
+   else
+   memcpy(task->iovs[i].iov_base, bdev->data + offset,
+  task->iovs[i].iov_len);
+   offset += task->iovs[i].iov_len;
+   nbytes += task->iovs[i].iov_len;
+   }
+
+   return nbytes;
+}
+
+int
+vhost_bdev_process_blk_commands(struct vhost_block_dev *bdev,
+struct vhost_blk_task *task)
+{
+   int used_len;
+
+   if (unlikely(task

[dpdk-dev] [PATCH 1/2] net/ice: remove unused devargs

2019-07-09 Thread Qi Zhang
Remove devarg "max_queue_pair_num" related code since
it is not complete implemented.

Fixes: f9cf4f864150 ("net/ice: support device initialization")
Cc: sta...@dpdk.org

Signed-off-by: Qi Zhang 
---
 doc/guides/nics/ice.rst  |  8 --
 drivers/net/ice/ice_ethdev.c | 66 ++--
 2 files changed, 3 insertions(+), 71 deletions(-)

diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index 666b1b272..e9b3a48bc 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -49,14 +49,6 @@ Please note that enabling debugging options may affect 
system performance.
 Runtime Config Options
 ~~
 
-- ``Maximum Number of Queue Pairs``
-
-  The maximum number of queue pairs is decided by HW. If not configured, APP
-  uses the number from HW. Users can check the number by calling the API
-  ``rte_eth_dev_info_get``.
-  If users want to limit the number of queues, they can set a smaller number
-  using EAL parameter like ``max_queue_pair_num=n``.
-
 
 Driver compilation and testing
 --
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 9ce730cd4..4b5cd8269 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -17,7 +17,6 @@
 #include "ice_rxtx.h"
 #include "ice_switch_filter.h"
 
-#define ICE_MAX_QP_NUM "max_queue_pair_num"
 #define ICE_DFLT_OUTER_TAG_TYPE ICE_AQ_VSI_OUTER_TAG_VLAN_9100
 #define ICE_DFLT_PKG_FILE "/lib/firmware/intel/ice/ddp/ice.pkg"
 
@@ -251,59 +250,6 @@ ice_init_controlq_parameter(struct ice_hw *hw)
 }
 
 static int
-ice_check_qp_num(const char *key, const char *qp_value,
-__rte_unused void *opaque)
-{
-   char *end = NULL;
-   int num = 0;
-
-   while (isblank(*qp_value))
-   qp_value++;
-
-   num = strtoul(qp_value, &end, 10);
-
-   if (!num || (*end == '-') || errno) {
-   PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
-   "value must be > 0",
-   qp_value, key);
-   return -1;
-   }
-
-   return num;
-}
-
-static int
-ice_config_max_queue_pair_num(struct rte_devargs *devargs)
-{
-   struct rte_kvargs *kvlist;
-   const char *queue_num_key = ICE_MAX_QP_NUM;
-   int ret;
-
-   if (!devargs)
-   return 0;
-
-   kvlist = rte_kvargs_parse(devargs->args, NULL);
-   if (!kvlist)
-   return 0;
-
-   if (!rte_kvargs_count(kvlist, queue_num_key)) {
-   rte_kvargs_free(kvlist);
-   return 0;
-   }
-
-   if (rte_kvargs_process(kvlist, queue_num_key,
-  ice_check_qp_num, NULL) < 0) {
-   rte_kvargs_free(kvlist);
-   return 0;
-   }
-   ret = rte_kvargs_process(kvlist, queue_num_key,
-ice_check_qp_num, NULL);
-   rte_kvargs_free(kvlist);
-
-   return ret;
-}
-
-static int
 ice_res_pool_init(struct ice_res_pool_info *pool, uint32_t base,
  uint32_t num)
 {
@@ -1128,13 +1074,9 @@ ice_pf_sw_init(struct rte_eth_dev *dev)
struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
struct ice_hw *hw = ICE_PF_TO_HW(pf);
 
-   if (ice_config_max_queue_pair_num(dev->device->devargs) > 0)
-   pf->lan_nb_qp_max =
-   ice_config_max_queue_pair_num(dev->device->devargs);
-   else
-   pf->lan_nb_qp_max =
-   (uint16_t)RTE_MIN(hw->func_caps.common_cap.num_txq,
- hw->func_caps.common_cap.num_rxq);
+   pf->lan_nb_qp_max =
+   (uint16_t)RTE_MIN(hw->func_caps.common_cap.num_txq,
+ hw->func_caps.common_cap.num_rxq);
 
pf->lan_nb_qps = pf->lan_nb_qp_max;
 
@@ -3751,8 +3693,6 @@ static struct rte_pci_driver rte_ice_pmd = {
 RTE_PMD_REGISTER_PCI(net_ice, rte_ice_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(net_ice, pci_id_ice_map);
 RTE_PMD_REGISTER_KMOD_DEP(net_ice, "* igb_uio | uio_pci_generic | vfio-pci");
-RTE_PMD_REGISTER_PARAM_STRING(net_ice,
- ICE_MAX_QP_NUM "=");
 
 RTE_INIT(ice_init_log)
 {
-- 
2.13.6



[dpdk-dev] [PATCH 2/2] net/ice: add safe mode support devarg

2019-07-09 Thread Qi Zhang
Safe mode support is not necessary by default.
Driver be initialized without OS package silently may confuse users
since most advanced feature are disabled.
Add devarg for safe mode enabling only for when user intend to do this.

Signed-off-by: Qi Zhang 
---
 doc/guides/nics/ice.rst  | 10 +++
 drivers/net/ice/ice_ethdev.c | 63 +++-
 drivers/net/ice/ice_ethdev.h |  8 ++
 3 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index e9b3a48bc..a29d17c83 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -49,6 +49,16 @@ Please note that enabling debugging options may affect 
system performance.
 Runtime Config Options
 ~~
 
+- ``Safe Mode Support`` (default ``0``)
+
+  If driver failed to load OS package, by default driver's initialization 
failed.
+  But if user intend to use the device in safe mode, user can use ``devargs``
+  parameter ``safe-mode-support``. For example::
+
+-w 80:00.0,safe-mode-support=1
+
+  NOTE: In Safe mode, only very limited features are available, features like 
RSS,
+  checksum, fdir, tunneling ... are all disabled.
 
 Driver compilation and testing
 --
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 4b5cd8269..c348b2956 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -17,6 +17,14 @@
 #include "ice_rxtx.h"
 #include "ice_switch_filter.h"
 
+/* devargs */
+#define ICE_SAFE_MODE_SUPPORT_ARG "safe-mode-support"
+
+static const char * const ice_valid_args[] = {
+   ICE_SAFE_MODE_SUPPORT_ARG,
+   NULL
+};
+
 #define ICE_DFLT_OUTER_TAG_TYPE ICE_AQ_VSI_OUTER_TAG_VLAN_9100
 #define ICE_DFLT_PKG_FILE "/lib/firmware/intel/ice/ddp/ice.pkg"
 
@@ -1334,6 +1342,50 @@ ice_base_queue_get(struct ice_pf *pf)
 }
 
 static int
+parse_bool(const char *key, const char *value, void *args)
+{
+   int *i = (int *)args;
+   char *end;
+   int num;
+
+   num = strtoul(value, &end, 10);
+
+   if (num != 0 || num != 1) {
+   PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
+   "value must be 0 or 1",
+   value, key);
+   return -1;
+   }
+
+   *i = num;
+   return 0;
+}
+
+static int ice_parse_devargs(struct rte_eth_dev *dev)
+{
+   struct ice_adapter *ad =
+   ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+   struct rte_devargs *devargs = dev->device->devargs;
+   struct rte_kvargs *kvlist;
+   int ret;
+
+   if (devargs == NULL)
+   return 0;
+
+   kvlist = rte_kvargs_parse(devargs->args, ice_valid_args);
+   if (kvlist == NULL) {
+   PMD_INIT_LOG(ERR, "Invalid kvargs key\n");
+   return -EINVAL;
+   }
+
+   ret = rte_kvargs_process(kvlist, ICE_SAFE_MODE_SUPPORT_ARG,
+&parse_bool, &ad->devargs.safe_mode_support);
+
+   rte_kvargs_free(kvlist);
+   return ret;
+}
+
+static int
 ice_dev_init(struct rte_eth_dev *dev)
 {
struct rte_pci_device *pci_dev;
@@ -1366,6 +1418,7 @@ ice_dev_init(struct rte_eth_dev *dev)
hw->bus.device = pci_dev->addr.devid;
hw->bus.func = pci_dev->addr.function;
 
+   ice_parse_devargs(dev);
ice_init_controlq_parameter(hw);
 
ret = ice_init_hw(hw);
@@ -1376,8 +1429,14 @@ ice_dev_init(struct rte_eth_dev *dev)
 
ret = ice_load_pkg(dev);
if (ret) {
+   if (ad->devargs.safe_mode_support == 0) {
+   PMD_INIT_LOG(ERR, "Failed to load the DDP package,"
+   "Use --safe-mode-support=1 to enter 
Safe Mode");
+   return ret;
+   }
+
PMD_INIT_LOG(WARNING, "Failed to load the DDP package,"
-   "Entering Safe Mode");
+   "Entering Safe Mode");
ad->is_safe_mode = 1;
}
 
@@ -3693,6 +3752,8 @@ static struct rte_pci_driver rte_ice_pmd = {
 RTE_PMD_REGISTER_PCI(net_ice, rte_ice_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(net_ice, pci_id_ice_map);
 RTE_PMD_REGISTER_KMOD_DEP(net_ice, "* igb_uio | uio_pci_generic | vfio-pci");
+RTE_PMD_REGISTER_PARAM_STRING(net_ice,
+ ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>");
 
 RTE_INIT(ice_init_log)
 {
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 8a52239f5..f569da833 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -274,6 +274,13 @@ struct ice_pf {
 };
 
 /**
+ * Cache devargs parse result.
+ */
+struct ice_devargs {
+   int safe_mode_support;
+};
+
+/**
  * Structure to store private data for each PF/VF instance.
  */
 struct ice_adapter {
@@ -286,6 +293,7 @@ struct ice_adapter {
/* ptype mapping table */
uint32_t ptype_tbl[ICE_MAX_PKT_TY

[dpdk-dev] [PATCH v2 2/2] net/ice: add safe mode support devarg

2019-07-09 Thread Qi Zhang
Safe mode support is not necessary by default.
Driver be initialized without OS package silently may confuse users
since most advanced feature are disabled.
Add devarg for safe mode enabling only for when user intend to do this.

Signed-off-by: Qi Zhang 
---

-v2:
  fix missing return value check from ice_parse_devargs.
  minor document update.


 doc/guides/nics/ice.rst  | 11 +++
 drivers/net/ice/ice_ethdev.c | 68 +++-
 drivers/net/ice/ice_ethdev.h |  8 ++
 3 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index e9b3a48bc..03819d29f 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -49,6 +49,17 @@ Please note that enabling debugging options may affect 
system performance.
 Runtime Config Options
 ~~
 
+- ``Safe Mode Support`` (default ``0``)
+
+  If driver failed to load OS package, by default driver's initialization 
failed.
+  But if user intend to use the device without OS package, user can take 
``devargs``
+  parameter ``safe-mode-support``, for example::
+
+-w 80:00.0,safe-mode-support=1
+
+  Then the driver will be initialized successfully and the device will enter 
Safe Mode.
+  NOTE: In Safe mode, only very limited features are available, features like 
RSS,
+  checksum, fdir, tunneling ... are all disabled.
 
 Driver compilation and testing
 --
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 4b5cd8269..d3e093886 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -17,6 +17,14 @@
 #include "ice_rxtx.h"
 #include "ice_switch_filter.h"
 
+/* devargs */
+#define ICE_SAFE_MODE_SUPPORT_ARG "safe-mode-support"
+
+static const char * const ice_valid_args[] = {
+   ICE_SAFE_MODE_SUPPORT_ARG,
+   NULL
+};
+
 #define ICE_DFLT_OUTER_TAG_TYPE ICE_AQ_VSI_OUTER_TAG_VLAN_9100
 #define ICE_DFLT_PKG_FILE "/lib/firmware/intel/ice/ddp/ice.pkg"
 
@@ -1334,6 +1342,50 @@ ice_base_queue_get(struct ice_pf *pf)
 }
 
 static int
+parse_bool(const char *key, const char *value, void *args)
+{
+   int *i = (int *)args;
+   char *end;
+   int num;
+
+   num = strtoul(value, &end, 10);
+
+   if (num != 0 || num != 1) {
+   PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
+   "value must be 0 or 1",
+   value, key);
+   return -1;
+   }
+
+   *i = num;
+   return 0;
+}
+
+static int ice_parse_devargs(struct rte_eth_dev *dev)
+{
+   struct ice_adapter *ad =
+   ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+   struct rte_devargs *devargs = dev->device->devargs;
+   struct rte_kvargs *kvlist;
+   int ret;
+
+   if (devargs == NULL)
+   return 0;
+
+   kvlist = rte_kvargs_parse(devargs->args, ice_valid_args);
+   if (kvlist == NULL) {
+   PMD_INIT_LOG(ERR, "Invalid kvargs key\n");
+   return -EINVAL;
+   }
+
+   ret = rte_kvargs_process(kvlist, ICE_SAFE_MODE_SUPPORT_ARG,
+&parse_bool, &ad->devargs.safe_mode_support);
+
+   rte_kvargs_free(kvlist);
+   return ret;
+}
+
+static int
 ice_dev_init(struct rte_eth_dev *dev)
 {
struct rte_pci_device *pci_dev;
@@ -1366,6 +1418,12 @@ ice_dev_init(struct rte_eth_dev *dev)
hw->bus.device = pci_dev->addr.devid;
hw->bus.func = pci_dev->addr.function;
 
+   ret = ice_parse_devargs(dev);
+   if (ret) {
+   PMD_INIT_LOG(ERR, "Failed to parse devargs");
+   return -EINVAL;
+   }
+
ice_init_controlq_parameter(hw);
 
ret = ice_init_hw(hw);
@@ -1376,8 +1434,14 @@ ice_dev_init(struct rte_eth_dev *dev)
 
ret = ice_load_pkg(dev);
if (ret) {
+   if (ad->devargs.safe_mode_support == 0) {
+   PMD_INIT_LOG(ERR, "Failed to load the DDP package,"
+   "Use --safe-mode-support=1 to enter 
Safe Mode");
+   return ret;
+   }
+
PMD_INIT_LOG(WARNING, "Failed to load the DDP package,"
-   "Entering Safe Mode");
+   "Entering Safe Mode");
ad->is_safe_mode = 1;
}
 
@@ -3693,6 +3757,8 @@ static struct rte_pci_driver rte_ice_pmd = {
 RTE_PMD_REGISTER_PCI(net_ice, rte_ice_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(net_ice, pci_id_ice_map);
 RTE_PMD_REGISTER_KMOD_DEP(net_ice, "* igb_uio | uio_pci_generic | vfio-pci");
+RTE_PMD_REGISTER_PARAM_STRING(net_ice,
+ ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>");
 
 RTE_INIT(ice_init_log)
 {
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 8a52239f5..f569da833 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.h
@@ -274,6 +274,13 @@ struct ice_p

[dpdk-dev] [PATCH v2 1/2] net/ice: remove unused devargs

2019-07-09 Thread Qi Zhang
Remove devarg "max_queue_pair_num" related code since
it is not complete implemented.

Fixes: f9cf4f864150 ("net/ice: support device initialization")
Cc: sta...@dpdk.org

Signed-off-by: Qi Zhang 
---
 doc/guides/nics/ice.rst  |  8 --
 drivers/net/ice/ice_ethdev.c | 66 ++--
 2 files changed, 3 insertions(+), 71 deletions(-)

diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index 666b1b272..e9b3a48bc 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -49,14 +49,6 @@ Please note that enabling debugging options may affect 
system performance.
 Runtime Config Options
 ~~
 
-- ``Maximum Number of Queue Pairs``
-
-  The maximum number of queue pairs is decided by HW. If not configured, APP
-  uses the number from HW. Users can check the number by calling the API
-  ``rte_eth_dev_info_get``.
-  If users want to limit the number of queues, they can set a smaller number
-  using EAL parameter like ``max_queue_pair_num=n``.
-
 
 Driver compilation and testing
 --
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 9ce730cd4..4b5cd8269 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -17,7 +17,6 @@
 #include "ice_rxtx.h"
 #include "ice_switch_filter.h"
 
-#define ICE_MAX_QP_NUM "max_queue_pair_num"
 #define ICE_DFLT_OUTER_TAG_TYPE ICE_AQ_VSI_OUTER_TAG_VLAN_9100
 #define ICE_DFLT_PKG_FILE "/lib/firmware/intel/ice/ddp/ice.pkg"
 
@@ -251,59 +250,6 @@ ice_init_controlq_parameter(struct ice_hw *hw)
 }
 
 static int
-ice_check_qp_num(const char *key, const char *qp_value,
-__rte_unused void *opaque)
-{
-   char *end = NULL;
-   int num = 0;
-
-   while (isblank(*qp_value))
-   qp_value++;
-
-   num = strtoul(qp_value, &end, 10);
-
-   if (!num || (*end == '-') || errno) {
-   PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
-   "value must be > 0",
-   qp_value, key);
-   return -1;
-   }
-
-   return num;
-}
-
-static int
-ice_config_max_queue_pair_num(struct rte_devargs *devargs)
-{
-   struct rte_kvargs *kvlist;
-   const char *queue_num_key = ICE_MAX_QP_NUM;
-   int ret;
-
-   if (!devargs)
-   return 0;
-
-   kvlist = rte_kvargs_parse(devargs->args, NULL);
-   if (!kvlist)
-   return 0;
-
-   if (!rte_kvargs_count(kvlist, queue_num_key)) {
-   rte_kvargs_free(kvlist);
-   return 0;
-   }
-
-   if (rte_kvargs_process(kvlist, queue_num_key,
-  ice_check_qp_num, NULL) < 0) {
-   rte_kvargs_free(kvlist);
-   return 0;
-   }
-   ret = rte_kvargs_process(kvlist, queue_num_key,
-ice_check_qp_num, NULL);
-   rte_kvargs_free(kvlist);
-
-   return ret;
-}
-
-static int
 ice_res_pool_init(struct ice_res_pool_info *pool, uint32_t base,
  uint32_t num)
 {
@@ -1128,13 +1074,9 @@ ice_pf_sw_init(struct rte_eth_dev *dev)
struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
struct ice_hw *hw = ICE_PF_TO_HW(pf);
 
-   if (ice_config_max_queue_pair_num(dev->device->devargs) > 0)
-   pf->lan_nb_qp_max =
-   ice_config_max_queue_pair_num(dev->device->devargs);
-   else
-   pf->lan_nb_qp_max =
-   (uint16_t)RTE_MIN(hw->func_caps.common_cap.num_txq,
- hw->func_caps.common_cap.num_rxq);
+   pf->lan_nb_qp_max =
+   (uint16_t)RTE_MIN(hw->func_caps.common_cap.num_txq,
+ hw->func_caps.common_cap.num_rxq);
 
pf->lan_nb_qps = pf->lan_nb_qp_max;
 
@@ -3751,8 +3693,6 @@ static struct rte_pci_driver rte_ice_pmd = {
 RTE_PMD_REGISTER_PCI(net_ice, rte_ice_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(net_ice, pci_id_ice_map);
 RTE_PMD_REGISTER_KMOD_DEP(net_ice, "* igb_uio | uio_pci_generic | vfio-pci");
-RTE_PMD_REGISTER_PARAM_STRING(net_ice,
- ICE_MAX_QP_NUM "=");
 
 RTE_INIT(ice_init_log)
 {
-- 
2.13.6



[dpdk-dev] [PATCH v3 1/2] net/ice: remove unused devargs

2019-07-09 Thread Qi Zhang
Remove devarg "max_queue_pair_num" related code since
it is not complete implemented.

Fixes: f9cf4f864150 ("net/ice: support device initialization")
Cc: sta...@dpdk.org

Signed-off-by: Qi Zhang 
---
 doc/guides/nics/ice.rst  |  8 --
 drivers/net/ice/ice_ethdev.c | 66 ++--
 2 files changed, 3 insertions(+), 71 deletions(-)

diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index 666b1b272..e9b3a48bc 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -49,14 +49,6 @@ Please note that enabling debugging options may affect 
system performance.
 Runtime Config Options
 ~~
 
-- ``Maximum Number of Queue Pairs``
-
-  The maximum number of queue pairs is decided by HW. If not configured, APP
-  uses the number from HW. Users can check the number by calling the API
-  ``rte_eth_dev_info_get``.
-  If users want to limit the number of queues, they can set a smaller number
-  using EAL parameter like ``max_queue_pair_num=n``.
-
 
 Driver compilation and testing
 --
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 9ce730cd4..4b5cd8269 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -17,7 +17,6 @@
 #include "ice_rxtx.h"
 #include "ice_switch_filter.h"
 
-#define ICE_MAX_QP_NUM "max_queue_pair_num"
 #define ICE_DFLT_OUTER_TAG_TYPE ICE_AQ_VSI_OUTER_TAG_VLAN_9100
 #define ICE_DFLT_PKG_FILE "/lib/firmware/intel/ice/ddp/ice.pkg"
 
@@ -251,59 +250,6 @@ ice_init_controlq_parameter(struct ice_hw *hw)
 }
 
 static int
-ice_check_qp_num(const char *key, const char *qp_value,
-__rte_unused void *opaque)
-{
-   char *end = NULL;
-   int num = 0;
-
-   while (isblank(*qp_value))
-   qp_value++;
-
-   num = strtoul(qp_value, &end, 10);
-
-   if (!num || (*end == '-') || errno) {
-   PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
-   "value must be > 0",
-   qp_value, key);
-   return -1;
-   }
-
-   return num;
-}
-
-static int
-ice_config_max_queue_pair_num(struct rte_devargs *devargs)
-{
-   struct rte_kvargs *kvlist;
-   const char *queue_num_key = ICE_MAX_QP_NUM;
-   int ret;
-
-   if (!devargs)
-   return 0;
-
-   kvlist = rte_kvargs_parse(devargs->args, NULL);
-   if (!kvlist)
-   return 0;
-
-   if (!rte_kvargs_count(kvlist, queue_num_key)) {
-   rte_kvargs_free(kvlist);
-   return 0;
-   }
-
-   if (rte_kvargs_process(kvlist, queue_num_key,
-  ice_check_qp_num, NULL) < 0) {
-   rte_kvargs_free(kvlist);
-   return 0;
-   }
-   ret = rte_kvargs_process(kvlist, queue_num_key,
-ice_check_qp_num, NULL);
-   rte_kvargs_free(kvlist);
-
-   return ret;
-}
-
-static int
 ice_res_pool_init(struct ice_res_pool_info *pool, uint32_t base,
  uint32_t num)
 {
@@ -1128,13 +1074,9 @@ ice_pf_sw_init(struct rte_eth_dev *dev)
struct ice_pf *pf = ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private);
struct ice_hw *hw = ICE_PF_TO_HW(pf);
 
-   if (ice_config_max_queue_pair_num(dev->device->devargs) > 0)
-   pf->lan_nb_qp_max =
-   ice_config_max_queue_pair_num(dev->device->devargs);
-   else
-   pf->lan_nb_qp_max =
-   (uint16_t)RTE_MIN(hw->func_caps.common_cap.num_txq,
- hw->func_caps.common_cap.num_rxq);
+   pf->lan_nb_qp_max =
+   (uint16_t)RTE_MIN(hw->func_caps.common_cap.num_txq,
+ hw->func_caps.common_cap.num_rxq);
 
pf->lan_nb_qps = pf->lan_nb_qp_max;
 
@@ -3751,8 +3693,6 @@ static struct rte_pci_driver rte_ice_pmd = {
 RTE_PMD_REGISTER_PCI(net_ice, rte_ice_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(net_ice, pci_id_ice_map);
 RTE_PMD_REGISTER_KMOD_DEP(net_ice, "* igb_uio | uio_pci_generic | vfio-pci");
-RTE_PMD_REGISTER_PARAM_STRING(net_ice,
- ICE_MAX_QP_NUM "=");
 
 RTE_INIT(ice_init_log)
 {
-- 
2.13.6



[dpdk-dev] [PATCH v3 2/2] net/ice: add safe mode support devarg

2019-07-09 Thread Qi Zhang
Safe mode support is not necessary by default.
Driver be initialized without OS package silently may confuse users
since most advanced feature are disabled.
Add devarg for safe mode enabling only for when user intend to do this.

Signed-off-by: Qi Zhang 
---
-v3:
  fix wrong logic in parse_bool.

-v2:
  fix missing return value check from ice_parse_devargs.
  minor document update.

 doc/guides/nics/ice.rst  | 11 +++
 drivers/net/ice/ice_ethdev.c | 68 +++-
 drivers/net/ice/ice_ethdev.h |  8 ++
 3 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index e9b3a48bc..03819d29f 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -49,6 +49,17 @@ Please note that enabling debugging options may affect 
system performance.
 Runtime Config Options
 ~~
 
+- ``Safe Mode Support`` (default ``0``)
+
+  If driver failed to load OS package, by default driver's initialization 
failed.
+  But if user intend to use the device without OS package, user can take 
``devargs``
+  parameter ``safe-mode-support``, for example::
+
+-w 80:00.0,safe-mode-support=1
+
+  Then the driver will be initialized successfully and the device will enter 
Safe Mode.
+  NOTE: In Safe mode, only very limited features are available, features like 
RSS,
+  checksum, fdir, tunneling ... are all disabled.
 
 Driver compilation and testing
 --
diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c
index 4b5cd8269..a7cbe0848 100644
--- a/drivers/net/ice/ice_ethdev.c
+++ b/drivers/net/ice/ice_ethdev.c
@@ -17,6 +17,14 @@
 #include "ice_rxtx.h"
 #include "ice_switch_filter.h"
 
+/* devargs */
+#define ICE_SAFE_MODE_SUPPORT_ARG "safe-mode-support"
+
+static const char * const ice_valid_args[] = {
+   ICE_SAFE_MODE_SUPPORT_ARG,
+   NULL
+};
+
 #define ICE_DFLT_OUTER_TAG_TYPE ICE_AQ_VSI_OUTER_TAG_VLAN_9100
 #define ICE_DFLT_PKG_FILE "/lib/firmware/intel/ice/ddp/ice.pkg"
 
@@ -1334,6 +1342,50 @@ ice_base_queue_get(struct ice_pf *pf)
 }
 
 static int
+parse_bool(const char *key, const char *value, void *args)
+{
+   int *i = (int *)args;
+   char *end;
+   int num;
+
+   num = strtoul(value, &end, 10);
+
+   if (num != 0 && num != 1) {
+   PMD_DRV_LOG(WARNING, "invalid value:\"%s\" for key:\"%s\", "
+   "value must be 0 or 1",
+   value, key);
+   return -1;
+   }
+
+   *i = num;
+   return 0;
+}
+
+static int ice_parse_devargs(struct rte_eth_dev *dev)
+{
+   struct ice_adapter *ad =
+   ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
+   struct rte_devargs *devargs = dev->device->devargs;
+   struct rte_kvargs *kvlist;
+   int ret;
+
+   if (devargs == NULL)
+   return 0;
+
+   kvlist = rte_kvargs_parse(devargs->args, ice_valid_args);
+   if (kvlist == NULL) {
+   PMD_INIT_LOG(ERR, "Invalid kvargs key\n");
+   return -EINVAL;
+   }
+
+   ret = rte_kvargs_process(kvlist, ICE_SAFE_MODE_SUPPORT_ARG,
+&parse_bool, &ad->devargs.safe_mode_support);
+
+   rte_kvargs_free(kvlist);
+   return ret;
+}
+
+static int
 ice_dev_init(struct rte_eth_dev *dev)
 {
struct rte_pci_device *pci_dev;
@@ -1366,6 +1418,12 @@ ice_dev_init(struct rte_eth_dev *dev)
hw->bus.device = pci_dev->addr.devid;
hw->bus.func = pci_dev->addr.function;
 
+   ret = ice_parse_devargs(dev);
+   if (ret) {
+   PMD_INIT_LOG(ERR, "Failed to parse devargs");
+   return -EINVAL;
+   }
+
ice_init_controlq_parameter(hw);
 
ret = ice_init_hw(hw);
@@ -1376,8 +1434,14 @@ ice_dev_init(struct rte_eth_dev *dev)
 
ret = ice_load_pkg(dev);
if (ret) {
+   if (ad->devargs.safe_mode_support == 0) {
+   PMD_INIT_LOG(ERR, "Failed to load the DDP package,"
+   "Use safe-mode-support=1 to enter Safe 
Mode");
+   return ret;
+   }
+
PMD_INIT_LOG(WARNING, "Failed to load the DDP package,"
-   "Entering Safe Mode");
+   "Entering Safe Mode");
ad->is_safe_mode = 1;
}
 
@@ -3693,6 +3757,8 @@ static struct rte_pci_driver rte_ice_pmd = {
 RTE_PMD_REGISTER_PCI(net_ice, rte_ice_pmd);
 RTE_PMD_REGISTER_PCI_TABLE(net_ice, pci_id_ice_map);
 RTE_PMD_REGISTER_KMOD_DEP(net_ice, "* igb_uio | uio_pci_generic | vfio-pci");
+RTE_PMD_REGISTER_PARAM_STRING(net_ice,
+ ICE_SAFE_MODE_SUPPORT_ARG "=<0|1>");
 
 RTE_INIT(ice_init_log)
 {
diff --git a/drivers/net/ice/ice_ethdev.h b/drivers/net/ice/ice_ethdev.h
index 8a52239f5..f569da833 100644
--- a/drivers/net/ice/ice_ethdev.h
+++ b/drivers/net/ice/ice_ethdev.

[dpdk-dev] [Bug 312] i40evf could not receive mulicast packets

2019-07-09 Thread bugzilla
https://bugs.dpdk.org/show_bug.cgi?id=312

Bug ID: 312
   Summary: i40evf could not receive mulicast packets
   Product: DPDK
   Version: 19.05
  Hardware: x86
OS: Linux
Status: CONFIRMED
  Severity: major
  Priority: Normal
 Component: ethdev
  Assignee: dev@dpdk.org
  Reporter: tb.bin...@qq.com
  Target Milestone: ---

Hi all, 

I found i40e with SRIOV mode, the vf in DPDK could not receive the multicast
packets. Was it a bug?


> Testing Environment:

>> network card : Ethernet controller: Intel Corporation Ethernet Controller
>> XL710 for 40GbE QSFP
>> CPU : Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
>> OS : Ubuntu 16.04
>> DPDK: 19.05

> Testing Steps:


# Step 1. Start to send multicast continuously from one PC
$ python multicast.py -I 192.168.99.142 "hello world"
Thu, 04 Jul 2019 11:53:23 DEBUGJoined the multicast network: 224.0.0.5 on
192.168.99.142
Thu, 04 Jul 2019 11:53:23 INFO Sent data: '1 1562212403.14 hello
world'
Thu, 04 Jul 2019 11:53:24 INFO Sent data: '2 1562212404.14 hello
world'
Thu, 04 Jul 2019 11:53:25 INFO Sent data: '3 1562212405.14 hello
world'
Thu, 04 Jul 2019 11:53:26 INFO Sent data: '4 1562212406.14 hello
world'


# Step 2. Test VF in NONE-dpdk mode can receive multicast
$ sudo ./dpdk-devbind.py -s

Network devices using DPDK-compatible driver


Network devices using kernel driver
===
:19:00.0 'I350 Gigabit Network Connection 1521' if=eth0 drv=igb
unused=igb_uio *Active*
:3b:00.0 'Ethernet Controller XL710 for 40GbE QSFP+ 1583' if=eth4 drv=i40e
unused=igb_uio *Active*
:3b:00.1 'Ethernet Controller XL710 for 40GbE QSFP+ 1583' if=eth5 drv=i40e
unused=igb_uio
:3b:0a.0 'Ethernet Virtual Function 700 Series 154c' if=eth6 drv=i40evf
unused=igb_uio    VF 

$ sudo ifconfig eth6 192.168.99.141/24   #   <<< configure IP for VF
$ sudo ifconfig eth5 up  #    configure PF interface to UP/Running
status


$ sudo tcpdump -i eth6 -n -A -XX host 192.168.99.142 &  #  start to capture
on VF card
...   <<   noting
$ sudo ip maddr add 01:00:5e:00:00:05 dev eth6   #  configure maddr for VF

$ sudo tcpdump -i eth6 -n -A -XX host 192.168.99.142
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth6, link-type EN10MB (Ethernet), capture size 262144 bytes
12:00:21.075607 IP 192.168.99.142.56250 > 224.0.0.5.5405: UDP, length 34
0x:  0100 5e00 0005 f8f2 1e35 e651 0800 4500  ..^..5.Q..E.
0x0010:  003e c9c2 4000 0111 abb0 c0a8 638e e000  .>..@...c...
0x0020:  0005 dbba 151d 002a f9de 3030 3030 3030  ...*..00
0x0030:  3431 3120 3135 3632 3231 3238 3231 2e31  411.1562212821.1
0x0040:  2068 656c 6c6f 2077 6f72 6c64.hello.world
12:00:22.077017 IP 192.168.99.142.56250 > 224.0.0.5.5405: UDP, length 35
0x:  0100 5e00 0005 f8f2 1e35 e651 0800 4500  ..^..5.Q..E.
0x0010:  003f caa4 4000 0111 aacd c0a8 638e e000  .?..@...c...
0x0020:  0005 dbba 151d 002b 247f 3030 3030 3030  ...+$.00
0x0030:  3431 3220 3135 3632 3231 3238 3232 2e31  412.1562212822.1
0x0040:  3120 6865 6c6c 6f20 776f 726c 64 1.hello.world
12:00:23.078429 IP 192.168.99.142.56250 > 224.0.0.5.5405: UDP, length 35
  captured the multicast

Step 3: Test VF in DPDK mode could NOT receive multicast
$ sudo ifconfig eth6 down
$ sudo ./dpdk-devbind.py -b igb_uio eth6  # < bind vf to igb_uio

$ sudo ./build/kni -l 0-5 -n 4 -- -P -p 0x1 -m --config="(0,1,3)"  #  
start KNI program
EAL: Detected 48 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Probing VFIO support...
EAL: PCI device :19:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device :19:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device :19:00.2 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device :19:00.3 on NUMA socket 0
EAL:   probe driver: 8086:1521 net_e1000_igb
EAL: PCI device :3b:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1583 net_i40e
EAL: PCI device :3b:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1583 net_i40e
EAL: PCI device :3b:0a.0 on NUMA socket 0
EAL:   probe driver: 8086:154c net_i40e_vf
APP: Initialising port 0 ...

Checking link status
done
Port0 Link Up - speed 4Mbps - full-duplex
APP: 
APP: KNI Running
APP: kill -SIGUSR1 140041
APP: Show KNI Statistics.
APP: kill -SIGUSR2 140041
APP: Zero KNI Statistics.
APP: 
APP: Lcore 1 is reading from port 0
APP: Lcore 2 has nothing to do
APP: Lcore 3 is writing to port 0
APP: Lcore 4 has nothing to do
APP: Lcore 0 ha

Re: [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast enqueue function

2019-07-09 Thread Jason Wang



On 2019/7/9 上午1:13, Marvin Liu wrote:

In fast enqueue function, will first check whether descriptors are
cache aligned. Fast enqueue function will check prerequisites in the
beginning. Fast enqueue function do not support chained mbufs, normal
function will handle that.

Signed-off-by: Marvin Liu 

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 884befa85..f24026acd 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -39,6 +39,8 @@
  
  #define VHOST_LOG_CACHE_NR 32
  
+/* Used in fast packed ring functions */

+#define PACKED_DESC_PER_CACHELINE (RTE_CACHE_LINE_SIZE / sizeof(struct 
vring_packed_desc))
  /**
   * Structure contains buffer address, length and descriptor index
   * from vring to do scatter RX.
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 003aec1d4..b877510da 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -897,6 +897,115 @@ virtio_dev_rx_split(struct virtio_net *dev, struct 
vhost_virtqueue *vq,
return pkt_idx;
  }
  
+static __rte_always_inline uint16_t

+virtio_dev_rx_fast_packed(struct virtio_net *dev, struct vhost_virtqueue *vq,
+   struct rte_mbuf **pkts)
+{
+   bool wrap_counter = vq->avail_wrap_counter;
+   struct vring_packed_desc *descs = vq->desc_packed;
+   uint16_t avail_idx = vq->last_avail_idx;
+   uint64_t desc_addr, desc_addr1, desc_addr2, desc_addr3, len, len1,
+   len2, len3;
+   struct virtio_net_hdr_mrg_rxbuf *hdr, *hdr1, *hdr2, *hdr3;
+   uint32_t buf_offset = dev->vhost_hlen;
+
+   if (unlikely(avail_idx & 0x3))
+   return -1;
+
+   if (unlikely(avail_idx < (vq->size - PACKED_DESC_PER_CACHELINE)))
+   rte_prefetch0((void *)(uintptr_t)&descs[avail_idx +
+   PACKED_DESC_PER_CACHELINE]);
+   else
+   rte_prefetch0((void *)(uintptr_t)&descs[0]);
+
+   if (unlikely((pkts[0]->next != NULL) |
+   (pkts[1]->next != NULL) |
+   (pkts[2]->next != NULL) |
+   (pkts[3]->next != NULL)))
+   return -1;
+
+   if (unlikely(!desc_is_avail(&descs[avail_idx], wrap_counter)) |
+   unlikely(!desc_is_avail(&descs[avail_idx + 1], wrap_counter)) |
+   unlikely(!desc_is_avail(&descs[avail_idx + 2], wrap_counter)) |
+   unlikely(!desc_is_avail(&descs[avail_idx + 3], wrap_counter)))
+   return 1;



Any reason for not letting compiler to unroll the loops?

Thanks




Re: [dpdk-dev] [v4] net/e1000: i219 unit hang issue fix on reset/close

2019-07-09 Thread Anand H. Krishnan
More comments inline:

On Tue, Jul 9, 2019 at 9:16 PM Xiao Zhang  wrote:
>
> Unit hang may occur if multiple descriptors are available in the rings
> during reset or close. This state can be detected by configure status
> by bit 8 in register. If the bit is set and there are pending descriptors
> in one of the rings, we must flush them before reset or close.
>
> Signed-off-by: Xiao Zhang 
> ---
>  drivers/net/e1000/e1000_ethdev.h |   4 ++
>  drivers/net/e1000/igb_ethdev.c   |   4 ++
>  drivers/net/e1000/igb_rxtx.c | 105 
> +++
>  3 files changed, 113 insertions(+)
>
> diff --git a/drivers/net/e1000/e1000_ethdev.h 
> b/drivers/net/e1000/e1000_ethdev.h
> index 67acb73..349144a 100644
> --- a/drivers/net/e1000/e1000_ethdev.h
> +++ b/drivers/net/e1000/e1000_ethdev.h
> @@ -35,6 +35,9 @@
>  #define IGB_MAX_RX_QUEUE_NUM   8
>  #define IGB_MAX_RX_QUEUE_NUM_82576 16
>
> +#define E1000_I219_MAX_RX_QUEUE_NUM2
> +#define E1000_I219_MAX_TX_QUEUE_NUM2
> +
>  #define E1000_SYN_FILTER_ENABLE0x0001 /* syn filter enable field 
> */
>  #define E1000_SYN_FILTER_QUEUE 0x000E /* syn filter queue field 
> */
>  #define E1000_SYN_FILTER_QUEUE_SHIFT   1  /* syn filter queue field 
> */
> @@ -522,5 +525,6 @@ int igb_action_rss_same(const struct rte_flow_action_rss 
> *comp,
>  int igb_config_rss_filter(struct rte_eth_dev *dev,
> struct igb_rte_flow_rss_conf *conf,
> bool add);
> +void igb_flush_desc_rings(struct rte_eth_dev *dev);
>
>  #endif /* _E1000_ETHDEV_H_ */
> diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
> index 3ee28cf..845101b 100644
> --- a/drivers/net/e1000/igb_ethdev.c
> +++ b/drivers/net/e1000/igb_ethdev.c
> @@ -1589,6 +1589,10 @@ eth_igb_close(struct rte_eth_dev *dev)
> eth_igb_stop(dev);
> adapter->stopped = 1;
>
> +   /* Flush desc rings for i219 */
> +   if (hw->mac.type >= e1000_pch_spt)
> +   igb_flush_desc_rings(dev);
> +

I am a bit confused as to which driver handles i219. I thought it is
the em_ethdev.c.
Also, the place to put this code, I think is more appropriate in eth_em_stop.


> e1000_phy_hw_reset(hw);
> igb_release_manageability(hw);
> igb_hw_control_release(hw);
> diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
> index c5606de..cad28f3 100644
> --- a/drivers/net/e1000/igb_rxtx.c
> +++ b/drivers/net/e1000/igb_rxtx.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -63,6 +64,10 @@
>  #define IGB_TX_OFFLOAD_NOTSUP_MASK \
> (PKT_TX_OFFLOAD_MASK ^ IGB_TX_OFFLOAD_MASK)
>
> +/* PCI offset for querying descriptor ring status*/
> +#define PCICFG_DESC_RING_STATUS   0xE4
> +#define FLUSH_DESC_REQUIRED   0x100
> +
>  /**
>   * Structure associated with each descriptor of the RX ring of a RX queue.
>   */
> @@ -2962,3 +2967,103 @@ igb_config_rss_filter(struct rte_eth_dev *dev,
>
> return 0;
>  }
> +
> +static void e1000_flush_tx_ring(struct rte_eth_dev *dev)
> +{
> +   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +   volatile union e1000_adv_tx_desc *tx_desc;
> +   uint32_t tdt, tctl, txd_lower = E1000_TXD_CMD_IFCS;
> +   uint16_t size = 512;
> +   struct igb_tx_queue *txq;
> +   int i;
> +
> +   if (dev->data->tx_queues == NULL)
> +   return;
> +   tctl = E1000_READ_REG(hw, E1000_TCTL);
> +   E1000_WRITE_REG(hw, E1000_TCTL, tctl | E1000_TCTL_EN);
> +   for (i = 0; i < dev->data->nb_tx_queues &&
> +   i < E1000_I219_MAX_TX_QUEUE_NUM; i++) {
> +   txq = dev->data->tx_queues[i];
> +   tdt = E1000_READ_REG(hw, E1000_TDT(i));
> +   if (tdt != txq->tx_tail)
> +   return;
> +   tx_desc = &txq->tx_ring[txq->tx_tail];
> +   tx_desc->read.buffer_addr = txq->tx_ring_phys_addr;

Should you be using rte_cpu_to_le64 here?


> +   tx_desc->read.cmd_type_len = rte_cpu_to_le_32(txd_lower | 
> size);

There is a lower.data  and upper.data that needs to be set, not this
field AFAIR.


> +   tx_desc->read.olinfo_status = 0;
> +
> +   rte_wmb();
> +   txq->tx_tail++;
> +   if (txq->tx_tail == txq->nb_tx_desc)
> +   txq->tx_tail = 0;
> +   rte_io_wmb();
> +   E1000_WRITE_REG(hw, E1000_TDT(i), txq->tx_tail);

Should you be using E1000_PCI_REG_WRITE_RELAXED. Can you just check
other instances of how this register is written?

Thanks,
Anand


> +   usec_delay(250);
> +   }
> +}
> +
> +static void e1000_flush_rx_ring(struct rte_eth_dev *dev)
> +{
> +   uint32_t rctl, rxdctl;
> +   struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> +   int i;