Re: [dpdk-dev] [RFC] testpmd: handle UFO packets

2018-03-01 Thread Jason Wang



On 2018年02月28日 22:53, Tan, Jianfeng wrote:

Hi Jason,


On 2/28/2018 10:10 PM, Jason Wang wrote:



On 2018年02月24日 17:35, Jianfeng Tan wrote:

Mostly likely, we will make UFO as a kind of GSO engine.

For short term, we can just call APIs in librte_ip_frag to fragment.

To test:

1. start testpmd with two vhost port.
  $ set fwd csum
  $ start

2. start vm0 connected to vhost0;
  $ ifconfig xxx 1.1.1.1/24 up
  $ ethtool -K xxx ufo on

3. start vm1 connected to vhost1;
  $ ifconfig xxx 1.1.1.2/24 up
  $ ethtool -K xxx ufo on
  $ (Fill a large file named 1.txt)
  $ cat 1.txt | socat - udp-sendto:1.1.1.1:5000


Just a reminder, UFO was completely removed upstream.



Thank you for the information.

Saw the deprecation patch at Linux v4.16-rc3, wonder what "version+" 
counts "merden kernels" in "modern kernels will no longer generate UFO 
skbs"? 


git describe d9d30adf56777c402c0027c0e6ae21f17cc0a365
v4.12-11055-gd9d30ad

So I think any Linux version beyond 4.12 won't generate any UFO packets.

And this is mostly for stock VMs with old kernels to help the 
migration from kernel vswitch to user space vswitch.




Yes, testpmd may still see UFO packets for old kernels. Just a reminder 
in case you miss it.


(Btw, we plan to support UDP tunnel offload for virtio-net.)

Will other OSes generate UFO packets, FreeBSD, Windows? Anyone can 
provide such information?


I don't know about them.

Thanks.



Thanks,
Jianfeng






[dpdk-dev] [PATCH v2 1/5] lib/ethdev: support for inline IPsec events

2018-03-01 Thread Anoob Joseph
Adding support for IPsec events in rte_eth_event framework. In inline
IPsec offload, the per packet protocol defined variables, like ESN,
would be managed by PMD. In such cases, PMD would need IPsec events
to notify application about various conditions like, ESN overflow.

Signed-off-by: Anoob Joseph 
---
v2:
* Added time expiry & byte expiry IPsec events in the enum

 lib/librte_ether/rte_ethdev.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 0361533..96b2aa0 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -2438,6 +2438,33 @@ int
 rte_eth_tx_done_cleanup(uint16_t port_id, uint16_t queue_id, uint32_t 
free_cnt);
 
 /**
+ * Subtypes for IPsec offload events raised by eth device.
+ */
+enum rte_eth_event_ipsec_subtype {
+   RTE_ETH_EVENT_IPSEC_UNKNOWN = 0,
+   /** Unknown event type */
+   RTE_ETH_EVENT_IPSEC_ESN_OVERFLOW,
+   /** Sequence number overflow in security offload */
+   RTE_ETH_EVENT_IPSEC_SA_TIME_EXPIRY,
+   /** Soft time expiry of SA */
+   RTE_ETH_EVENT_IPSEC_SA_BYTE_EXPIRY,
+   /** Soft byte expiry of SA */
+   RTE_ETH_EVENT_IPSEC_MAX
+   /** Max value of this enum */
+};
+
+/**
+ * Descriptor for IPsec event. Used by eth dev to send extra information of the
+ * event.
+ */
+struct rte_eth_event_ipsec_desc {
+   enum rte_eth_event_ipsec_subtype stype;
+   /** Type of IPsec event */
+   uint64_t md;
+   /** Event specific metadata */
+};
+
+/**
  * The eth device event type for interrupt, and maybe others in the future.
  */
 enum rte_eth_event_type {
@@ -2448,6 +2475,7 @@ enum rte_eth_event_type {
RTE_ETH_EVENT_INTR_RESET,
/**< reset interrupt event, sent to VF on PF reset */
RTE_ETH_EVENT_VF_MBOX,  /**< message from the VF received by PF */
+   RTE_ETH_EVENT_IPSEC,/**< IPsec offload related event */
RTE_ETH_EVENT_MACSEC,   /**< MACsec offload related event */
RTE_ETH_EVENT_INTR_RMV, /**< device removal event */
RTE_ETH_EVENT_NEW,  /**< port is probed */
-- 
2.7.4



[dpdk-dev] [PATCH v2 5/5] app/testpmd: support for IPsec event

2018-03-01 Thread Anoob Joseph
Adding support for IPsec event

Signed-off-by: Anoob Joseph 
---
v2:
* No change

 app/test-pmd/parameters.c | 2 ++
 app/test-pmd/testpmd.c| 2 ++
 2 files changed, 4 insertions(+)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 97d22b8..7ea882f 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -512,6 +512,8 @@ parse_event_printing_config(const char *optarg, int enable)
mask = UINT32_C(1) << RTE_ETH_EVENT_INTR_RESET;
else if (!strcmp(optarg, "vf_mbox"))
mask = UINT32_C(1) << RTE_ETH_EVENT_VF_MBOX;
+   else if (!strcmp(optarg, "ipsec"))
+   mask = UINT32_C(1) << RTE_ETH_EVENT_IPSEC;
else if (!strcmp(optarg, "macsec"))
mask = UINT32_C(1) << RTE_ETH_EVENT_MACSEC;
else if (!strcmp(optarg, "intr_rmv"))
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c0e258..32fb8b1 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -292,6 +292,7 @@ uint32_t event_print_mask = (UINT32_C(1) << 
RTE_ETH_EVENT_UNKNOWN) |
(UINT32_C(1) << RTE_ETH_EVENT_INTR_LSC) |
(UINT32_C(1) << RTE_ETH_EVENT_QUEUE_STATE) |
(UINT32_C(1) << RTE_ETH_EVENT_INTR_RESET) |
+   (UINT32_C(1) << RTE_ETH_EVENT_IPSEC) |
(UINT32_C(1) << RTE_ETH_EVENT_MACSEC) |
(UINT32_C(1) << RTE_ETH_EVENT_INTR_RMV);
 
@@ -2024,6 +2025,7 @@ eth_event_callback(portid_t port_id, enum 
rte_eth_event_type type, void *param,
[RTE_ETH_EVENT_QUEUE_STATE] = "Queue state",
[RTE_ETH_EVENT_INTR_RESET] = "Interrupt reset",
[RTE_ETH_EVENT_VF_MBOX] = "VF Mbox",
+   [RTE_ETH_EVENT_IPSEC] = "IPsec",
[RTE_ETH_EVENT_MACSEC] = "MACsec",
[RTE_ETH_EVENT_INTR_RMV] = "device removal",
[RTE_ETH_EVENT_NEW] = "device probed",
-- 
2.7.4



[dpdk-dev] [PATCH v2 0/5] handle seq no overflow in IPsec offload

2018-03-01 Thread Anoob Joseph
This series enables application to set the sequence number soft limit
for IPsec offload. In inline IPsec offload, as the sequence number
(maintained by PMD/device) reaches the specified soft limit, the PMD
would raise an "IPSEC_EVENT". This event would have some metadata,
which would be used by the application to identify the SA on which the
sequence number overflow is about to happen.

Anoob Joseph (5):
  lib/ethdev: support for inline IPsec events
  lib/security: add ESN soft limit in conf
  lib/security: extend userdata for IPsec events
  examples/ipsec-secgw: handle ESN soft limit event
  app/testpmd: support for IPsec event

 app/test-pmd/parameters.c |  2 ++
 app/test-pmd/testpmd.c|  2 ++
 examples/ipsec-secgw/ipsec-secgw.c| 56 +++
 examples/ipsec-secgw/ipsec.c  | 10 --
 examples/ipsec-secgw/ipsec.h  |  2 ++
 lib/librte_ether/rte_ethdev.h | 28 
 lib/librte_security/rte_security.h| 16 +
 lib/librte_security/rte_security_driver.h |  6 ++--
 8 files changed, 110 insertions(+), 12 deletions(-)

-- 
2.7.4



[dpdk-dev] [PATCH v2 2/5] lib/security: add ESN soft limit in conf

2018-03-01 Thread Anoob Joseph
Adding ESN soft limit in conf. This will be used in case of protocol
offload. Per SA, application could specify for what ESN the security
device need to notify application. In case of eth dev(inline protocol),
rte_eth_event framework would raise an IPsec event.

Signed-off-by: Anoob Joseph 
---
v2:
* No change

 lib/librte_security/rte_security.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_security/rte_security.h 
b/lib/librte_security/rte_security.h
index c75c121..a71ff6f 100644
--- a/lib/librte_security/rte_security.h
+++ b/lib/librte_security/rte_security.h
@@ -222,6 +222,8 @@ struct rte_security_ipsec_xform {
/**< IPsec SA Mode - transport/tunnel */
struct rte_security_ipsec_tunnel_param tunnel;
/**< Tunnel parameters, NULL for transport mode */
+   uint64_t esn_soft_limit;
+   /**< ESN for which the overflow event need to be raised by eth dev */
 };
 
 /**
-- 
2.7.4



[dpdk-dev] [PATCH v2 4/5] examples/ipsec-secgw: handle ESN soft limit event

2018-03-01 Thread Anoob Joseph
For inline protocol processing, the PMD/device is required to maintain
the ESN. But the application is required to monitor ESN overflow to
initiate SA expiry.

For such cases, application would set the ESN soft limit. An IPsec event
would be raised by rte_eth_event framework, when ESN hits the soft limit
set by the application.

Signed-off-by: Anoob Joseph 
---
v2:
* No change

 examples/ipsec-secgw/ipsec-secgw.c | 56 ++
 examples/ipsec-secgw/ipsec.c   | 10 +--
 examples/ipsec-secgw/ipsec.h   |  2 ++
 3 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/examples/ipsec-secgw/ipsec-secgw.c 
b/examples/ipsec-secgw/ipsec-secgw.c
index 3a8562e..5726fd3 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ipsec.h"
 #include "parser.h"
@@ -1640,6 +1641,58 @@ pool_init(struct socket_ctx *ctx, int32_t socket_id, 
uint32_t nb_mbuf)
printf("Allocated mbuf pool on socket %d\n", socket_id);
 }
 
+static inline int
+inline_ipsec_event_esn_overflow(struct rte_security_ctx *ctx, uint64_t md)
+{
+   struct ipsec_sa *sa;
+
+   /* For inline protocol processing, the metadata in the event will
+* uniquely identify the security session which raised the event.
+* Application would then need the userdata it had registered with the
+* security session to process the event.
+*/
+
+   sa = (struct ipsec_sa *)rte_security_get_userdata(ctx, md);
+
+   if (sa == NULL) {
+   /* userdata could not be retrieved */
+   return -1;
+   }
+
+   /* Sequence number over flow. SA need to be re-established */
+   RTE_SET_USED(sa);
+   return 0;
+}
+
+static int
+inline_ipsec_event_callback(uint16_t port_id, enum rte_eth_event_type type,
+void *param, void *ret_param)
+{
+   struct rte_eth_event_ipsec_desc *event_desc = NULL;
+   struct rte_security_ctx *ctx = (struct rte_security_ctx *)
+   rte_eth_dev_get_sec_ctx(port_id);
+
+   RTE_SET_USED(param);
+
+   if (type != RTE_ETH_EVENT_IPSEC)
+   return -1;
+
+   event_desc = ret_param;
+   if (event_desc == NULL) {
+   printf("Event descriptor not set\n");
+   return -1;
+   }
+
+   if (event_desc->stype == RTE_ETH_EVENT_IPSEC_ESN_OVERFLOW)
+   return inline_ipsec_event_esn_overflow(ctx, event_desc->md);
+   else if (event_desc->stype >= RTE_ETH_EVENT_IPSEC_MAX) {
+   printf("Invalid IPsec event reported\n");
+   return -1;
+   }
+
+   return -1;
+}
+
 int32_t
 main(int32_t argc, char **argv)
 {
@@ -1727,6 +1780,9 @@ main(int32_t argc, char **argv)
 */
if (promiscuous_on)
rte_eth_promiscuous_enable(portid);
+
+   rte_eth_dev_callback_register(portid,
+   RTE_ETH_EVENT_IPSEC, inline_ipsec_event_callback, NULL);
}
 
check_all_ports_link_status(nb_ports, enabled_port_mask);
diff --git a/examples/ipsec-secgw/ipsec.c b/examples/ipsec-secgw/ipsec.c
index 5fb5bc1..acdd189 100644
--- a/examples/ipsec-secgw/ipsec.c
+++ b/examples/ipsec-secgw/ipsec.c
@@ -36,6 +36,7 @@ set_ipsec_conf(struct ipsec_sa *sa, struct 
rte_security_ipsec_xform *ipsec)
}
/* TODO support for Transport and IPV6 tunnel */
}
+   ipsec->esn_soft_limit = IPSEC_OFFLOAD_ESN_SOFTLIMIT;
 }
 
 static inline int
@@ -270,11 +271,14 @@ create_session(struct ipsec_ctx *ipsec_ctx, struct 
ipsec_sa *sa)
 * the packet is received, this userdata will be
 * retrieved using the metadata from the packet.
 *
-* This is required only for inbound SAs.
+* The PMD is expected to set similar metadata for other
+* operations, like rte_eth_event, which are tied to
+* security session. In such cases, the userdata could
+* be obtained to uniquely identify the security
+* parameters denoted.
 */
 
-   if (sa->direction == RTE_SECURITY_IPSEC_SA_DIR_INGRESS)
-   sess_conf.userdata = (void *) sa;
+   sess_conf.userdata = (void *) sa;
 
sa->sec_session = rte_security_session_create(ctx,
&sess_conf, ipsec_ctx->session_pool);
diff --git a/examples/ipsec-secgw/ipsec.h b/examples/ipsec-secgw/ipsec.h
index 6059f6c..c1450f6 100644
--- a/examples/ipsec-secgw/ipsec.h
+++ b/examples/ipsec-secgw/ipsec.h
@@ -21,6 +21,8 @@
 
 #define MAX_DIGEST_SIZE 32 /* Bytes -- 256 bits */
 
+#define IPSEC_OFFLOAD_ESN_SOFTLIMIT 0xff00
+
 #define IV

[dpdk-dev] [PATCH v2 3/5] lib/security: extend userdata for IPsec events

2018-03-01 Thread Anoob Joseph
Extending 'userdata' to be used for IPsec events too.

IPsec events would have some metadata which would uniquely identify the
security session for which the event is raised. But application would
need some construct which it can understand. The 'userdata' solves a
similar problem for inline processed inbound traffic. Updating the
documentation to extend the usage of 'userdata'.

Signed-off-by: Anoob Joseph 
---
v2:
* No change

 lib/librte_security/rte_security.h| 14 --
 lib/librte_security/rte_security_driver.h |  6 +++---
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/lib/librte_security/rte_security.h 
b/lib/librte_security/rte_security.h
index a71ff6f..e8b5888 100644
--- a/lib/librte_security/rte_security.h
+++ b/lib/librte_security/rte_security.h
@@ -364,15 +364,17 @@ rte_security_set_pkt_metadata(struct rte_security_ctx 
*instance,
  struct rte_mbuf *mb, void *params);
 
 /**
- * Get userdata associated with the security session which processed the
- * packet. This userdata would be registered while creating the session, and
- * application can use this to identify the SA etc. Device-specific metadata
- * in the mbuf would be used for this.
+ * Get userdata associated with the security session. Device specific metadata
+ * provided would be used to uniquely identify the security session being
+ * referred to. This userdata would be registered while creating the session,
+ * and application can use this to identify the SA etc.
  *
- * This is valid only for inline processed ingress packets.
+ * Device specific metadata would be set in mbuf for inline processed inbound
+ * packets. In addition, the same metadata would be set for IPsec events
+ * reported by rte_eth_event framework.
  *
  * @param   instance   security instance
- * @param   md device-specific metadata set in mbuf
+ * @param   md device-specific metadata
  *
  * @return
  *  - On success, userdata
diff --git a/lib/librte_security/rte_security_driver.h 
b/lib/librte_security/rte_security_driver.h
index 4623904..0583f88 100644
--- a/lib/librte_security/rte_security_driver.h
+++ b/lib/librte_security/rte_security_driver.h
@@ -134,9 +134,9 @@ typedef int (*security_set_pkt_metadata_t)(void *device,
void *params);
 
 /**
- * Get application specific userdata associated with the security session which
- * processed the packet. This would be retrieved using the metadata obtained
- * from packet.
+ * Get application specific userdata associated with the security session.
+ * Device specific metadata provided would be used to uniquely identify
+ * the security session being referred to.
  *
  * @param  device  Crypto/eth device pointer
  * @param  md  Metadata
-- 
2.7.4



[dpdk-dev] [PATCH 0/2] net/e1000: convert to new Rx/Tx offloads API

2018-03-01 Thread Wei Dai
This patch set convert net/e1000 to new Rx/Tx offloads API.
All Rx offloads are per port features.
All Tx offloads of e1000 are per queue and also per packet as they
are enabled in Tx descriptor.
In the new offload API, per queue offload only need to be set in
queue_setup(). So if the maimum number of queues is only one in
Rx or Tx path, let all offloads in the path are per queue for better
convenience.

Wei Dai (2):
  net/e1000: convert to new Rx offloads API
  net/e1000: convert to new Tx offloads API

 drivers/net/e1000/em_ethdev.c  | 33 +++-
 drivers/net/e1000/em_rxtx.c| 30 +++---
 drivers/net/e1000/igb_ethdev.c | 57 +++---
 drivers/net/e1000/igb_rxtx.c   | 69 +-
 4 files changed, 138 insertions(+), 51 deletions(-)

-- 
2.9.4



[dpdk-dev] [PATCH 1/2] net/e1000: convert to new Rx offloads API

2018-03-01 Thread Wei Dai
Ethdev Rx offloads API has changed since:
commit ce17eddefc20 ("ethdev: introduce Rx queue offloads API")
This commit support the new Rx offloads API.

Signed-off-by: Wei Dai 
---
 drivers/net/e1000/em_ethdev.c  | 32 +++-
 drivers/net/e1000/em_rxtx.c| 27 ++---
 drivers/net/e1000/igb_ethdev.c | 53 +
 drivers/net/e1000/igb_rxtx.c   | 66 +-
 4 files changed, 127 insertions(+), 51 deletions(-)

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 242375f..acd0d22 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1105,15 +1105,26 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
dev_info->max_rx_pktlen = em_get_max_pktlen(hw);
dev_info->max_mac_addrs = hw->mac.rar_entry_count;
dev_info->rx_offload_capa =
-   DEV_RX_OFFLOAD_VLAN_STRIP |
-   DEV_RX_OFFLOAD_IPV4_CKSUM |
-   DEV_RX_OFFLOAD_UDP_CKSUM  |
-   DEV_RX_OFFLOAD_TCP_CKSUM;
+   DEV_RX_OFFLOAD_VLAN_STRIP  |
+   DEV_RX_OFFLOAD_VLAN_FILTER |
+   DEV_RX_OFFLOAD_IPV4_CKSUM  |
+   DEV_RX_OFFLOAD_UDP_CKSUM   |
+   DEV_RX_OFFLOAD_TCP_CKSUM   |
+   DEV_RX_OFFLOAD_CRC_STRIP   |
+   DEV_RX_OFFLOAD_SCATTER;
+   if (dev_info->max_rx_pktlen > ETHER_MAX_LEN)
+   dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_JUMBO_FRAME;
dev_info->tx_offload_capa =
DEV_TX_OFFLOAD_VLAN_INSERT |
DEV_TX_OFFLOAD_IPV4_CKSUM  |
DEV_TX_OFFLOAD_UDP_CKSUM   |
DEV_TX_OFFLOAD_TCP_CKSUM;
+   /*
+* As only one Rx/Tx queue can be used, let per queue offloading
+* capability be same to per port queue offloading capability
+* for better compatibility.
+*/
+   dev_info->rx_queue_offload_capa = dev_info->rx_offload_capa;
 
/*
 * Starting with 631xESB hw supports 2 TX/RX queues per port.
@@ -1460,15 +1471,18 @@ em_vlan_hw_strip_enable(struct rte_eth_dev *dev)
 static int
 eth_em_vlan_offload_set(struct rte_eth_dev *dev, int mask)
 {
+   struct rte_eth_rxmode *rxmode;
+
+   rxmode = &dev->data->dev_conf.rxmode;
if(mask & ETH_VLAN_STRIP_MASK){
-   if (dev->data->dev_conf.rxmode.hw_vlan_strip)
+   if (rxmode->offloads & DEV_RX_OFFLOAD_VLAN_STRIP)
em_vlan_hw_strip_enable(dev);
else
em_vlan_hw_strip_disable(dev);
}
 
if(mask & ETH_VLAN_FILTER_MASK){
-   if (dev->data->dev_conf.rxmode.hw_vlan_filter)
+   if (rxmode->offloads & DEV_RX_OFFLOAD_VLAN_FILTER)
em_vlan_hw_filter_enable(dev);
else
em_vlan_hw_filter_disable(dev);
@@ -1835,10 +1849,12 @@ eth_em_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
 
/* switch to jumbo mode if needed */
if (frame_size > ETHER_MAX_LEN) {
-   dev->data->dev_conf.rxmode.jumbo_frame = 1;
+   dev->data->dev_conf.rxmode.offloads |=
+   DEV_RX_OFFLOAD_JUMBO_FRAME;
rctl |= E1000_RCTL_LPE;
} else {
-   dev->data->dev_conf.rxmode.jumbo_frame = 0;
+   dev->data->dev_conf.rxmode.offloads &=
+   ~DEV_RX_OFFLOAD_JUMBO_FRAME;
rctl &= ~E1000_RCTL_LPE;
}
E1000_WRITE_REG(hw, E1000_RCTL, rctl);
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 02fae10..9b328b1 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -85,6 +85,7 @@ struct em_rx_queue {
struct em_rx_entry *sw_ring;   /**< address of RX software ring. */
struct rte_mbuf *pkt_first_seg; /**< First segment of current packet. */
struct rte_mbuf *pkt_last_seg;  /**< Last segment of current packet. */
+   uint64_toffloads;   /**< Offloads of DEV_RX_OFFLOAD_* */
uint16_tnb_rx_desc; /**< number of RX descriptors. */
uint16_trx_tail;/**< current value of RDT register. */
uint16_tnb_rx_hold; /**< number of held free RX desc. */
@@ -1382,8 +1383,8 @@ eth_em_rx_queue_setup(struct rte_eth_dev *dev,
rxq->rx_free_thresh = rx_conf->rx_free_thresh;
rxq->queue_id = queue_idx;
rxq->port_id = dev->data->port_id;
-   rxq->crc_len = (uint8_t) ((dev->data->dev_conf.rxmode.hw_strip_crc) ?
-   0 : ETHER_CRC_LEN);
+   rxq->crc_len = (uint8_t)((dev->data->dev_conf.rxmode.offloads &
+   DEV_RX_OFFLOAD_CRC_STRIP) ? 0 : ETHER_CRC_LEN);
 
rxq->rdt_reg_addr = E1000_PCI_REG_ADDR(hw, E1000_RDT(queue_idx));
rxq->rdh_reg_addr = E1000_PCI_REG_ADDR(hw, E1000_RDH(queue_idx));
@@ -1395,6 +1396,

[dpdk-dev] [PATCH 2/2] net/e1000: convert to new Tx offloads API

2018-03-01 Thread Wei Dai
Ethdev Tx offloads API has changed since:
commit cba7f53b717d ("ethdev: introduce Tx queue offloads API")
This commit support the new Tx offloads API.

Signed-off-by: Wei Dai 
---
 drivers/net/e1000/em_ethdev.c  | 1 +
 drivers/net/e1000/em_rxtx.c| 3 +++
 drivers/net/e1000/igb_ethdev.c | 4 
 drivers/net/e1000/igb_rxtx.c   | 3 +++
 4 files changed, 11 insertions(+)

diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index acd0d22..a9439c2 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1125,6 +1125,7 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
 * for better compatibility.
 */
dev_info->rx_queue_offload_capa = dev_info->rx_offload_capa;
+   dev_info->tx_queue_offload_capa = dev_info->tx_offload_capa;
 
/*
 * Starting with 631xESB hw supports 2 TX/RX queues per port.
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 9b328b1..6039c97 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -164,6 +164,7 @@ struct em_tx_queue {
uint8_twthresh;  /**< Write-back threshold register. */
struct em_ctx_info ctx_cache;
/**< Hardware context history.*/
+   uint64_t   offloads; /**< offloads of DEV_TX_OFFLOAD_* */
 };
 
 #if 1
@@ -1270,6 +1271,7 @@ eth_em_tx_queue_setup(struct rte_eth_dev *dev,
em_reset_tx_queue(txq);
 
dev->data->tx_queues[queue_idx] = txq;
+   txq->offloads = tx_conf->offloads;
return 0;
 }
 
@@ -1916,4 +1918,5 @@ em_txq_info_get(struct rte_eth_dev *dev, uint16_t 
queue_id,
qinfo->conf.tx_thresh.wthresh = txq->wthresh;
qinfo->conf.tx_free_thresh = txq->tx_free_thresh;
qinfo->conf.tx_rs_thresh = txq->tx_rs_thresh;
+   qinfo->conf.offloads = txq->offloads;
 }
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index 7c47171..9396502 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -2217,6 +2217,7 @@ eth_igb_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
DEV_TX_OFFLOAD_TCP_CKSUM   |
DEV_TX_OFFLOAD_SCTP_CKSUM  |
DEV_TX_OFFLOAD_TCP_TSO;
+   dev_info->tx_queue_offload_capa = dev_info->tx_offload_capa;
 
switch (hw->mac.type) {
case e1000_82575:
@@ -2289,6 +2290,7 @@ eth_igb_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.wthresh = IGB_DEFAULT_TX_WTHRESH,
},
.txq_flags = 0,
+   .offloads = 0,
};
 
dev_info->rx_desc_lim = rx_desc_lim;
@@ -2348,6 +2350,7 @@ eth_igbvf_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
DEV_TX_OFFLOAD_TCP_CKSUM   |
DEV_TX_OFFLOAD_SCTP_CKSUM  |
DEV_TX_OFFLOAD_TCP_TSO;
+   dev_info->tx_queue_offload_capa = dev_info->tx_offload_capa;
switch (hw->mac.type) {
case e1000_vfadapt:
dev_info->max_rx_queues = 2;
@@ -2382,6 +2385,7 @@ eth_igbvf_infos_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
.wthresh = IGB_DEFAULT_TX_WTHRESH,
},
.txq_flags = 0,
+   .offloads = 0,
};
 
dev_info->rx_desc_lim = rx_desc_lim;
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 9c33fda..0fcd9c4 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -181,6 +181,7 @@ struct igb_tx_queue {
/**< Start context position for transmit queue. */
struct igb_advctx_info ctx_cache[IGB_CTX_NUM];
/**< Hardware context history.*/
+   uint64_t   offloads; /**< offloads of DEV_TX_OFFLOAD_* */
 };
 
 #if 1
@@ -1543,6 +1544,7 @@ eth_igb_tx_queue_setup(struct rte_eth_dev *dev,
dev->tx_pkt_burst = eth_igb_xmit_pkts;
dev->tx_pkt_prepare = ð_igb_prep_pkts;
dev->data->tx_queues[queue_idx] = txq;
+   txq->offloads = tx_conf->offloads;
 
return 0;
 }
@@ -2794,6 +2796,7 @@ igb_txq_info_get(struct rte_eth_dev *dev, uint16_t 
queue_id,
qinfo->conf.tx_thresh.pthresh = txq->pthresh;
qinfo->conf.tx_thresh.hthresh = txq->hthresh;
qinfo->conf.tx_thresh.wthresh = txq->wthresh;
+   qinfo->conf.offloads = txq->offloads;
 }
 
 int
-- 
2.9.4



Re: [dpdk-dev] [PATCH v2 02/10] bus/dpaa: fix the BE compilation issue

2018-03-01 Thread Shreyansh Jain
On Thu, Mar 1, 2018 at 1:03 PM, Hemant Agrawal  wrote:
>
> The array pointers were used without index.
>
> Fixes: b9083ea5e084 ("net/dpaa: further push mode optimizations")
> Cc: sta...@dpdk.org
>
> Signed-off-by: Hemant Agrawal 
> ---
>  drivers/bus/dpaa/base/qbman/qman.c| 5 +++--
>  drivers/bus/dpaa/base/qbman/qman_driver.c | 5 +
>  2 files changed, 4 insertions(+), 6 deletions(-)
>

Acked-by: Shreyansh Jain 


[dpdk-dev] [PATCH] net/mlx: fix rdma-core glue path with EAL plugins

2018-03-01 Thread Adrien Mazarguil
Glue object files are looked up in RTE_EAL_PMD_PATH by default when set and
should be installed in this directory.

During startup, EAL attempts to load them automatically like other plug-ins
found in this directory. While normally harmless, dlopen() fails when
rdma-core is not installed, EAL interprets this as a fatal error and
terminates the application.

This patch requests glue objects to be installed into a sub-directory of
RTE_EAL_PMD_PATH to prevent their automatic loading.

Fixes: f6242d0655cd ("net/mlx: make rdma-core glue path configurable")
Cc: sta...@dpdk.org

Signed-off-by: Adrien Mazarguil 
Cc: Timothy Redaelli 
---
 doc/guides/nics/mlx4.rst | 7 ---
 doc/guides/nics/mlx5.rst | 7 ---
 drivers/net/mlx4/mlx4.c  | 3 ++-
 drivers/net/mlx5/mlx5.c  | 3 ++-
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 98b971667..602a5e257 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -98,9 +98,10 @@ These options can be modified in the ``.config`` file.
   missing with ``ldd(1)``.
 
   It works by moving these dependencies to a purpose-built rdma-core "glue"
-  plug-in, which must either be installed in ``CONFIG_RTE_EAL_PMD_PATH`` if
-  set, or in a standard location for the dynamic linker (e.g. ``/lib``) if
-  left to the default empty string (``""``).
+  plug-in, which must either be installed in the ``glue`` sub-directory of
+  ``CONFIG_RTE_EAL_PMD_PATH`` if set, or in a standard location for the
+  dynamic linker (e.g. ``/lib``) if left to the default empty string
+  (``""``).
 
   This option has no performance impact.
 
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 0e6e525c9..ad96d66f2 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -171,9 +171,10 @@ These options can be modified in the ``.config`` file.
   missing with ``ldd(1)``.
 
   It works by moving these dependencies to a purpose-built rdma-core "glue"
-  plug-in, which must either be installed in ``CONFIG_RTE_EAL_PMD_PATH`` if
-  set, or in a standard location for the dynamic linker (e.g. ``/lib``) if
-  left to the default empty string (``""``).
+  plug-in, which must either be installed in the ``glue`` sub-directory of
+  ``CONFIG_RTE_EAL_PMD_PATH`` if set, or in a standard location for the
+  dynamic linker (e.g. ``/lib``) if left to the default empty string
+  (``""``).
 
   This option has no performance impact.
 
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ee93dafe6..cfa2533ed 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -720,7 +720,8 @@ mlx4_glue_init(void)
 */
(geteuid() == getuid() && getegid() == getgid() ?
 getenv("MLX4_GLUE_PATH") : NULL),
-   RTE_EAL_PMD_PATH,
+   /* Use glue sub-directory when RTE_EAL_PMD_PATH is set. */
+   *RTE_EAL_PMD_PATH ? RTE_EAL_PMD_PATH "/glue" : "",
};
unsigned int i = 0;
void *handle = NULL;
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 61cb93101..22275be80 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1055,7 +1055,8 @@ mlx5_glue_init(void)
 */
(geteuid() == getuid() && getegid() == getgid() ?
 getenv("MLX5_GLUE_PATH") : NULL),
-   RTE_EAL_PMD_PATH,
+   /* Use glue sub-directory when RTE_EAL_PMD_PATH is set. */
+   *RTE_EAL_PMD_PATH ? RTE_EAL_PMD_PATH "/glue" : "",
};
unsigned int i = 0;
void *handle = NULL;
-- 
2.11.0


Re: [dpdk-dev] [RFC 0/7] PMD driver for AF_XDP

2018-03-01 Thread Zhang, Qi Z


> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Thursday, March 1, 2018 3:46 PM
> To: Zhang, Qi Z ; dev@dpdk.org
> Cc: Karlsson, Magnus ; Topel, Bjorn
> 
> Subject: Re: [dpdk-dev] [RFC 0/7] PMD driver for AF_XDP
> 
> 
> 
> On 2018年03月01日 12:20, Zhang, Qi Z wrote:
> > +Magnus, since a typo in my first batch in email address.
> >
> >> -Original Message-
> >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhang, Qi Z
> >> Sent: Thursday, March 1, 2018 12:19 PM
> >> To: Jason Wang;dev@dpdk.org
> >> Cc:magnus.karls...@intei.com; Topel, Bjorn
> >> Subject: Re: [dpdk-dev] [RFC 0/7] PMD driver for AF_XDP
> >>
> >>
> >>
> >>> -Original Message-
> >>> From: Jason Wang [mailto:jasow...@redhat.com]
> >>> Sent: Thursday, March 1, 2018 10:52 AM
> >>> To: Zhang, Qi Z;dev@dpdk.org
> >>> Cc:magnus.karls...@intei.com; Topel, Bjorn
> >>> Subject: Re: [dpdk-dev] [RFC 0/7] PMD driver for AF_XDP
> >>>
> >>>
> >>>
> >>> On 2018年02月27日 17:32, Qi Zhang wrote:
>  The RFC patches add a new PMD driver for AF_XDP which is a proposed
>  faster version of AF_PACKET interface in Linux, see below link for
>  detail AF_XDP introduction:
>  https://fosdem.org/2018/schedule/event/af_xdp/
>  https://lwn.net/Articles/745934/
> 
>  This patchset is base on v18.02.
>  It also require a linux kernel that have below AF_XDP RFC patches
>  be applied.
>  https://patchwork.ozlabs.org/patch/867961/
>  https://patchwork.ozlabs.org/patch/867960/
>  https://patchwork.ozlabs.org/patch/867938/
>  https://patchwork.ozlabs.org/patch/867939/
>  https://patchwork.ozlabs.org/patch/867940/
>  https://patchwork.ozlabs.org/patch/867941/
>  https://patchwork.ozlabs.org/patch/867942/
>  https://patchwork.ozlabs.org/patch/867943/
>  https://patchwork.ozlabs.org/patch/867944/
>  https://patchwork.ozlabs.org/patch/867945/
>  https://patchwork.ozlabs.org/patch/867946/
>  https://patchwork.ozlabs.org/patch/867947/
>  https://patchwork.ozlabs.org/patch/867948/
>  https://patchwork.ozlabs.org/patch/867949/
>  https://patchwork.ozlabs.org/patch/867950/
>  https://patchwork.ozlabs.org/patch/867951/
>  https://patchwork.ozlabs.org/patch/867952/
>  https://patchwork.ozlabs.org/patch/867953/
>  https://patchwork.ozlabs.org/patch/867954/
>  https://patchwork.ozlabs.org/patch/867955/
>  https://patchwork.ozlabs.org/patch/867956/
>  https://patchwork.ozlabs.org/patch/867957/
>  https://patchwork.ozlabs.org/patch/867958/
>  https://patchwork.ozlabs.org/patch/867959/
> 
>  There is no clean upstream target yet since kernel patch is still
>  in RFC stage, The purpose of the patchset is just for anyone that
>  want to eveluate af_xdp with DPDK application and get feedback for
>  further improvement.
> 
>  To try with the new PMD
>  1. compile and install the kernel with above patches applied.
>  2. configure $LINUX_HEADER_DIR (dir of "make headers_install")
>   and $TOOLS_DIR (dir at /tools) at
> >>> driver/net/af_xdp/Makefile
>   before compile DPDK.
>  3. make sure libelf and libbpf is installed.
> 
>  BTW, performance test shows our PMD can reach 94%~98% of the
>  orignal benchmark when share memory is enabled.
> >>> Hi:
> >>>
> >>> Looks like zero copy is not used in this series. Any plan to support that?
> >> Zero copy is enabled in patch 5, if a mempool passed check_mempool,
> >> it will be registered to af_xdp socket.
> >> so there will be no memcpy between mbuf and af_xdp.
> 
> Aha, I see. So the zerocopy was limited to some specific use case. And if I
> understand it correctly, zc mode could not be used for VM.

I think except the limitation for mempool layout, zerocopy is transparent to 
DPDK application, only difference is performance.
Sorry, I may not get your point, if you could explain more about the VM usage.

Regards
Qi
> 
> Thanks
> 
> >>> If not, what's the advantage compared to vhost-net + tap +
> XDP_REDIRECT?
> >>>
> >>> Have you measured l2fwd performance in this case? I believe the
> >>> number you refer here is rxdrop (XDP_DRV) which is 11.6Mpps.
> >> Actually we measure the performance on rxonly / txonly / l2fwd on
> >> i40e with XDP_SKB and XDP_DRV_ZC
> >>
> >> Regards
> >> Qi
> >>
> >>> Thanks



Re: [dpdk-dev] [RFC 0/7] PMD driver for AF_XDP

2018-03-01 Thread Jason Wang



On 2018年03月01日 20:56, Zhang, Qi Z wrote:

BTW, performance test shows our PMD can reach 94%~98% of the
orignal benchmark when share memory is enabled.

Hi:

Looks like zero copy is not used in this series. Any plan to support that?

Zero copy is enabled in patch 5, if a mempool passed check_mempool,
it will be registered to af_xdp socket.
so there will be no memcpy between mbuf and af_xdp.

Aha, I see. So the zerocopy was limited to some specific use case. And if I
understand it correctly, zc mode could not be used for VM.

I think except the limitation for mempool layout, zerocopy is transparent to 
DPDK application, only difference is performance.
Sorry, I may not get your point, if you could explain more about the VM usage.

Regards
Qi


No problem, so the question is:

Can zerocopy be used when using testpmd to foward packets between 
vhost-user and AF_XDP socket?


Thanks


[dpdk-dev] [PATCH 2/2] event/sw: code refractor for sw_refill_pp_buf

2018-03-01 Thread Vipin Varghese
Code changes how shadow buffer are filled up in each calls.
Refilling the shadow buffer helped in improving 0.2 Mpps.

Signed-off-by: Vipin Varghese 
---
 drivers/event/sw/sw_evdev_scheduler.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/event/sw/sw_evdev_scheduler.c 
b/drivers/event/sw/sw_evdev_scheduler.c
index 70d1970..a95a22a 100644
--- a/drivers/event/sw/sw_evdev_scheduler.c
+++ b/drivers/event/sw/sw_evdev_scheduler.c
@@ -451,6 +451,10 @@ __pull_port_lb(struct sw_evdev *sw, uint32_t port_id, int 
allow_reorder)
port->pp_buf_count--;
} /* while (avail_qes) */
 
+   /* replensih buffers before next iteration */
+   if (port->pp_buf_count == 0)
+   sw_refill_pp_buf(sw, port);
+
return pkts_iter;
 }
 
-- 
2.7.4



[dpdk-dev] [PATCH 1/2] event/sw: code refractor to reduce the fetch stall

2018-03-01 Thread Vipin Varghese
With rearranging the code to prefetch the contents before
loop check increases performance from single and multistage
atomic pipeline.

Signed-off-by: Vipin Varghese 
---
 drivers/event/sw/sw_evdev_scheduler.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/event/sw/sw_evdev_scheduler.c 
b/drivers/event/sw/sw_evdev_scheduler.c
index e3a41e0..70d1970 100644
--- a/drivers/event/sw/sw_evdev_scheduler.c
+++ b/drivers/event/sw/sw_evdev_scheduler.c
@@ -44,12 +44,13 @@ sw_schedule_atomic_to_cq(struct sw_evdev *sw, struct sw_qid 
* const qid,
uint32_t qid_id = qid->id;
 
iq_dequeue_burst(sw, &qid->iq[iq_num], qes, count);
-   for (i = 0; i < count; i++) {
-   const struct rte_event *qe = &qes[i];
-   const uint16_t flow_id = SW_HASH_FLOWID(qes[i].flow_id);
-   struct sw_fid_t *fid = &qid->fids[flow_id];
-   int cq = fid->cq;
 
+   const struct rte_event *qe = &qes[0];
+   const uint16_t flow_id = SW_HASH_FLOWID(qes[0].flow_id);
+   struct sw_fid_t *fid = &qid->fids[flow_id];
+   int cq = fid->cq;
+
+   for (i = 0; i < count; i++) {
if (cq < 0) {
uint32_t cq_idx = qid->cq_next_tx++;
if (qid->cq_next_tx == qid->cq_num_mapped_cqs)
@@ -101,6 +102,13 @@ sw_schedule_atomic_to_cq(struct sw_evdev *sw, struct 
sw_qid * const qid,
&sw->cq_ring_space[cq]);
p->cq_buf_count = 0;
}
+
+   if (likely(i+1 < count)) {
+   qe = (qes + i + 1);
+   flow_id = SW_HASH_FLOWID(qes[i + 1].flow_id);
+   fid = &qid->fids[flow_id];
+   cq = fid->cq;
+   }
}
iq_put_back(sw, &qid->iq[iq_num], blocked_qes, nb_blocked);
 
-- 
2.7.4



Re: [dpdk-dev] [PATCH] net/mlx5: use PCI BDF as the port name

2018-03-01 Thread Ferruh Yigit
On 1/22/2018 9:30 AM, Yuanhan Liu wrote:
> It is suggested to use PCI BDF to identify a port for port addition
> in OVS-DPDK. While mlx5 has its own naming style: name it by ib dev
> name. This breaks the typical OVS DPDK use case and brings more puzzle
> to the end users.
> 
> To fix it, this patch changes it to use PCI BDF as the name, too.
> Also, a postfix " port %u" is added, just in case their might be more
> than 1 port assoicated with a PCI device.
> 
> Signed-off-by: Yuanhan Liu 

<...>

> @@ -633,14 +635,15 @@ mlx5_pci_probe(struct rte_pci_driver *pci_drv, struct 
> rte_pci_device *pci_dev)
>   .inline_max_packet_sz = MLX5_ARG_UNSET,
>   };
>  
> + len = snprintf(name, sizeof(name), PCI_PRI_FMT,
> +  pci_dev->addr.domain, pci_dev->addr.bus,
> +  pci_dev->addr.devid, pci_dev->addr.function);
> + if (device_attr.orig_attr.phys_port_cnt > 1)
> + snprintf(name + len, sizeof(name), " port %u", i);

Getting a build error [1] with icc [2].

Because commit 9a761de8ea14 ("net/mlx5: flow counter support") adds variable
"device_attr" into loop scope, but there is already a variable with same name in
function scope.
In this case the variable in the loop scope should be used and compiler error
looks correct, not sure why others compilers not complain about it.

[1]
.../dpdk/drivers/net/mlx5/mlx5.c(729): error #592: variable "device_attr" is
used before its value is set
if (device_attr.orig_attr.phys_port_cnt > 1)

^

[2]
icc (ICC) 18.0.1 20171018


Re: [dpdk-dev] [PATCH 1/4] vhost: move fdset functions from fd_man.c to fd_man.h

2018-03-01 Thread Thomas Monjalon
01/03/2018 07:02, Tan, Jianfeng:
> From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> > On 02/28/2018 02:36 AM, Yang, Zhiyong wrote:
> > > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com]
> > >> On 02/14/2018 03:53 PM, Zhiyong Yang wrote:
> > >>>lib/librte_vhost/Makefile |   3 +-
> > >>>lib/librte_vhost/fd_man.c | 274 
> > >>> --
> > >>>lib/librte_vhost/fd_man.h | 258
> > >> +--
> > >>>3 files changed, 253 insertions(+), 282 deletions(-)
> > >>>delete mode 100644 lib/librte_vhost/fd_man.c
> > >>
> > >> I disagree with the patch.
> > >> It is a good thing to reuse the code, but to do it, you need to extend 
> > >> the
> > >> vhost lib API.
> > >>
> > >> New API need to be prefixed with rte_vhost_, and be declared in
> > >> rte_vhost.h.
> > >>
> > >> And no need to move the functions from the .c to the .h file, as it
> > moreover
> > >> makes you inline them, which is not necessary here.
> > >
> > > Thanks for your reviewing the series firstly, Maxime. :)
> > >
> > > I considered to do it as you said. However I still preferred this one at 
> > > last.
> > > Here are my reasons.
> > > 1) As far as I know, this set of functions are used privately in 
> > > librte_vhost
> > before this feature.
> > > No strong request from the perspective of DPDK application. If I
> > understand well,  It is enough to expose the functions to all PMDs
> > > And it is better to keep internal use in DPDK.
> > 
> > But what the patch is doing is adding fd_man.h to the API, without doing
> > it properly. fd_man.h will be installed with other header files, and any
> > external application can use it.
> > 
> > >
> > > 2) These functions help to implement vhost user, but they are not strongly
> > related to other APIs of vhost user which have already exposed.
> > > if we want to expose them as APIs at lib layer, many functions and related
> > data structure has to be exposed in rte_vhost.h. it looks messy.
> > > Your opinion?
> > 
> > Yes, it is not really vhost-related, it could be part of a more generic
> > library. It is maybe better to duplicate these lines, or to move this
> > code in a existing or new library.
> 
> I vote to move it to generic library, maybe eal. Poll() has better 
> compatibility even though poll() is not as performant as epoll().
> 
> Thomas, how do you think?

I don't see why it should be exported outside of DPDK, except for PMDs.
I would tend to keep it internal but I understand that it would mean
duplicating some code, which is not ideal.
Please could you show what would be the content of the .h in EAL?





Re: [dpdk-dev] [PATCH 7/7] build: add meson support for dpaaX platforms

2018-03-01 Thread Thomas Monjalon
01/03/2018 07:10, Hemant Agrawal:
> On 2/28/2018 8:14 PM, Bruce Richardson wrote:
> > On Tue, Feb 27, 2018 at 10:55:52PM +0530, Hemant Agrawal wrote:
> >> +includes += include_directories('../../../lib/librte_eal/linuxapp/eal')
> > 
> > Is this not covered by the dependency on eal? Is it accessing things
> > directly in the EAL internals?
> 
> We are accessing eal_vfio.h. so it is needed.

Let's try to fix it.
What is required exactly? Can it be in the exported header?




Re: [dpdk-dev] [PATCH] compressdev: implement API

2018-03-01 Thread Trahe, Fiona
Hi Shally

//snip//
> [Shally] This looks better to me. So it mean app would always call 
> xform_init() for stateless and attach an
> updated priv_xform to ops (depending upon if there's shareable or not). So it 
> does not need to have
> NULL pointer on priv_xform. right?
> 
[Fiona] yes. The PMD must return a valid priv_xform pointer.



[dpdk-dev] virtio with 2MB hugepages - bringing back single file segments

2018-03-01 Thread Stojaczyk, DariuszX
Hi,

I'm trying to make a vhost-user initiator built upon DPDK work with 2MB 
hugepages. In the initiator we have to share all memory with the host process, 
so it
can perform DMA. DPDK currently enforces having one descriptor per hugepage and 
there's an artificial limit of shared descriptors in DPDK vhost-user 
implementation (currently 8). Because of that, all DPDK vhost-user initiators 
are practically limited to 1GB hugepages at the moment. We can always increase 
the artificial descriptor limit, but then we're limited by sendmsg() itself, 
which on Linux accepts no more than 253 descriptors. However, could we increase 
the vhost-user implementation limit to - say - 128, and bring back "single file 
segments" [1]?

Could I send a patch series that does this? The single file segments code would 
go through a cleanup - at least making it available via a runtime option rather 
than #ifdefs.

I know there's an ongoing rework of the memory allocator in DPDK [2] and it 
includes a similar single file segments functionality. However, it will 
probably take quite some time before merged and even then, the new 
functionality would only be available in the *new* allocator. The old one is 
kept unchanged. It could use single file segments as well.

Regards,
D.

[1] http://dpdk.org/dev/patchwork/patch/16042/
[2] http://dpdk.org/ml/archives/dev/2017-December/084302.html 


[dpdk-dev] [PATCH] kni: fix compilation under RHEL 7.5

2018-03-01 Thread Lee Roberts
Fix kni compilation under RHEL 7.5.

Signed-off-by: Lee Roberts 
---
 lib/librte_eal/linuxapp/kni/compat.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/lib/librte_eal/linuxapp/kni/compat.h 
b/lib/librte_eal/linuxapp/kni/compat.h
index 3f8c0bc..6a6968d 100644
--- a/lib/librte_eal/linuxapp/kni/compat.h
+++ b/lib/librte_eal/linuxapp/kni/compat.h
@@ -101,6 +101,11 @@
 #undef NET_NAME_UNKNOWN
 #endif
 
+#if (defined(RHEL_RELEASE_CODE) && \
+   (RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 5)))
+#define ndo_change_mtu ndo_change_mtu_rh74
+#endif
+
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0)
 #define HAVE_SIGNAL_FUNCTIONS_OWN_HEADER
 #endif
-- 
1.9.1



Re: [dpdk-dev] [PATCH] kni: fix compilation under RHEL 7.5

2018-03-01 Thread Stephen Hemminger
On Thu,  1 Mar 2018 16:20:35 -0700
Lee Roberts  wrote:

> Fix kni compilation under RHEL 7.5.
> 
> Signed-off-by: Lee Roberts 
> ---
>  lib/librte_eal/linuxapp/kni/compat.h | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/lib/librte_eal/linuxapp/kni/compat.h 
> b/lib/librte_eal/linuxapp/kni/compat.h
> index 3f8c0bc..6a6968d 100644
> --- a/lib/librte_eal/linuxapp/kni/compat.h
> +++ b/lib/librte_eal/linuxapp/kni/compat.h
> @@ -101,6 +101,11 @@
>  #undef NET_NAME_UNKNOWN
>  #endif
>  
> +#if (defined(RHEL_RELEASE_CODE) && \
> + (RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 5)))
> +#define ndo_change_mtu ndo_change_mtu_rh74
> +#endif
> +
>  #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0)
>  #define HAVE_SIGNAL_FUNCTIONS_OWN_HEADER
>  #endif

Do we really want upstream DPDK trying to track every vendor kernel 
compatibility wart?
Should Redhat be owning this in their own DPDK package?


Re: [dpdk-dev] [PATCH] compressdev: implement API

2018-03-01 Thread Ahmed Mansour
On 3/1/2018 9:41 AM, Trahe, Fiona wrote:
> Hi Shally
>
> //snip//
>> [Shally] This looks better to me. So it mean app would always call 
>> xform_init() for stateless and attach an
>> updated priv_xform to ops (depending upon if there's shareable or not). So 
>> it does not need to have
>> NULL pointer on priv_xform. right?
>>
> [Fiona] yes. The PMD must return a valid priv_xform pointer.

[Ahmed] What I understood is that the xform_init will be called once
initially. if the @flag returned is NONE_SHAREABLE then the application
must not attach two inflight ops to the same @priv_xform? Otherwise the
application can attach many ops in flight to the @priv_xform?



Re: [dpdk-dev] virtio with 2MB hugepages - bringing back single file segments

2018-03-01 Thread Tan, Jianfeng
Hi Dariusz,

> -Original Message-
> From: Stojaczyk, DariuszX
> Sent: Friday, March 2, 2018 6:41 AM
> To: dev@dpdk.org; Tan, Jianfeng; Maxime Coquelin; Burakov, Anatoly;
> Yuanhan Liu
> Cc: Harris, James R; Thomas Monjalon
> Subject: virtio with 2MB hugepages - bringing back single file segments
> 
> Hi,
> 
> I'm trying to make a vhost-user initiator built upon DPDK work with 2MB
> hugepages. In the initiator we have to share all memory with the host
> process, so it
> can perform DMA. DPDK currently enforces having one descriptor per
> hugepage and there's an artificial limit of shared descriptors in DPDK vhost-
> user implementation (currently 8). Because of that, all DPDK vhost-user
> initiators are practically limited to 1GB hugepages at the moment. We can
> always increase the artificial descriptor limit, but then we're limited by
> sendmsg() itself, which on Linux accepts no more than 253 descriptors.
> However, could we increase the vhost-user implementation limit to - say -
> 128, and bring back "single file segments" [1]?

"Single file segments [1]" can help at the scenario that 2MB hugepages are not 
too scatter; i.e., some pages are physically contiguous.
But it cannot solve the issue completely (imagine the worst situation).

Plus, it makes the memory part a little complex.

So we are expecting it (with some other issues on memory part) being addressed 
completely by Anatoly's rework on memory.

Thanks,
Jianfeng


Re: [dpdk-dev] [PATCH] kni: fix compilation under RHEL 7.5

2018-03-01 Thread Roberts, Lee A.
> -Original Message-
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Thursday, March 01, 2018 5:18 PM
> To: Roberts, Lee A. 
> Cc: ferruh.yi...@intel.com; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] kni: fix compilation under RHEL 7.5
> 
> On Thu,  1 Mar 2018 16:20:35 -0700
> Lee Roberts  wrote:
> 
> > Fix kni compilation under RHEL 7.5.
> >
> > Signed-off-by: Lee Roberts 
> > ---
> >  lib/librte_eal/linuxapp/kni/compat.h | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/lib/librte_eal/linuxapp/kni/compat.h 
> > b/lib/librte_eal/linuxapp/kni/compat.h
> > index 3f8c0bc..6a6968d 100644
> > --- a/lib/librte_eal/linuxapp/kni/compat.h
> > +++ b/lib/librte_eal/linuxapp/kni/compat.h
> > @@ -101,6 +101,11 @@
> >  #undef NET_NAME_UNKNOWN
> >  #endif
> >
> > +#if (defined(RHEL_RELEASE_CODE) && \
> > +   (RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 5)))
> > +#define ndo_change_mtu ndo_change_mtu_rh74
> > +#endif
> > +
> >  #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0)
> >  #define HAVE_SIGNAL_FUNCTIONS_OWN_HEADER
> >  #endif
> 
> Do we really want upstream DPDK trying to track every vendor kernel 
> compatibility wart?
> Should Redhat be owning this in their own DPDK package?

If I look at the first few lines of ../lib/librte_eal/linuxapp/kni/compat.h,
it appears that tracking changes in RHEL and SLES is certainly part of the
purpose of this file:

  1 /*
  2  * Minimal wrappers to allow compiling kni on older kernels.
  3  */
  4 
  5 #include 
  6 
  7 #ifndef RHEL_RELEASE_VERSION
  8 #define RHEL_RELEASE_VERSION(a, b) (((a) << 8) + (b))
  9 #endif
 10 
 11 /* SuSE version macro is the same as Linux kernel version */
 12 #ifndef SLE_VERSION
 13 #define SLE_VERSION(a, b, c) KERNEL_VERSION(a, b, c)
 14 #endif
 15 #ifdef CONFIG_SUSE_KERNEL

If you want to remove the vendor dependency, I'd suggest the following actions:

1) Submit a patch to remove the RHEL- and SLES-specific code.
2) Disable KNI by default in the build system.

  - Lee


Re: [dpdk-dev] [vlan offload] does I350 support vlan offload?

2018-03-01 Thread James Huang
I got same issue, is there any possible way to make it work?

On Feb 3, 2018 5:26 PM, "真我风采" <1534057...@qq.com> wrote:

> Hi, All:
>  I want to use I350 vlan offload feature to impove performance, but
> get in trouble.
>  After enable port_conf.hw_vlan_strip, i can get vlan id in
> mbuf->vlan_tci, which indicate vlan rx offload is ok.
> but when enable vlan tx offload as below steps,
> a. set vlan id in mbuf->vlan_tci
> b. set PKT_TX_VLAN_PKT in mbuf->ol_flags
> then call rte_eth_tx_burst to transmit, and return value is greater than
> 0(actually 1), but cannot dump pkt with vlan id
> on opposite server.
>
>
>So does I350 supports vlan tx offload? or my usage have error ?
>
>
> Thanks!


[dpdk-dev] ?????? [vlan offload] does I350 support vlan offload?

2018-03-01 Thread ????????
Hi??
  after add mbuf->l2_len as below, it works now.
  mbuf->vlan_tci = vlan_id;
  mbuf->ol_flags |= PKT_TX_VLAN_PKT;
  mbuf->l2_len = sizeof(struct ether_hdr);



thanks!


--  --
??: "James Huang";
: 2018??3??2??(??) 11:33
??: ""<1534057...@qq.com>;
: "dev"; 
: Re: [dpdk-dev] [vlan offload] does I350 support vlan offload?



I got same issue, is there any possible way to make it work?

On Feb 3, 2018 5:26 PM, "" <1534057...@qq.com> wrote:
Hi, All:
  I want to use I350 vlan offload feature to impove performance, but get in 
trouble.
  After enable port_conf.hw_vlan_strip, i can get vlan id in 
mbuf->vlan_tci, which indicate vlan rx offload is ok.
 but when enable vlan tx offload as below steps,
 a. set vlan id in mbuf->vlan_tci
 b. set PKT_TX_VLAN_PKT in mbuf->ol_flags
 then call rte_eth_tx_burst to transmit, and return value is greater than 
0(actually 1), but cannot dump pkt with vlan id
 on opposite server.
 
 
So does I350 supports vlan tx offload? or my usage have error ?
 
 
 Thanks!

Re: [dpdk-dev] [RFC 0/7] PMD driver for AF_XDP

2018-03-01 Thread Zhang, Qi Z


> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Thursday, March 1, 2018 9:18 PM
> To: Zhang, Qi Z 
> Cc: Karlsson, Magnus ; Topel, Bjorn
> ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC 0/7] PMD driver for AF_XDP
> 
> 
> 
> On 2018年03月01日 20:56, Zhang, Qi Z wrote:
> >> BTW, performance test shows our PMD can reach 94%~98% of the
> >> orignal benchmark when share memory is enabled.
> > Hi:
> >
> > Looks like zero copy is not used in this series. Any plan to support
> that?
>  Zero copy is enabled in patch 5, if a mempool passed check_mempool,
>  it will be registered to af_xdp socket.
>  so there will be no memcpy between mbuf and af_xdp.
> >> Aha, I see. So the zerocopy was limited to some specific use case.
> >> And if I understand it correctly, zc mode could not be used for VM.
> > I think except the limitation for mempool layout, zerocopy is transparent
> to DPDK application, only difference is performance.
> > Sorry, I may not get your point, if you could explain more about the VM
> usage.
> >
> > Regards
> > Qi
> 
> No problem, so the question is:
> 
> Can zerocopy be used when using testpmd to foward packets between
> vhost-user and AF_XDP socket?

I'm not very familiar with vhost-user, but I guess the answer should be same as 
the case for forward packet between vhost-user and i40e, 
(if vhost-user does not have any special requirement for mempool that conflict 
with af_xdp ZC's requirement)

Regards
Qi

> 
> Thanks


[dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup

2018-03-01 Thread Qi Zhang
The patch let etherdev driver expose the capability flag through
rte_eth_dev_info_get when it support deferred queue configuraiton,
then base on the flag rte_eth_[rx|tx]_queue_setup could decide
continue to setup the queue or just return fail when device already
started.

Signed-off-by: Qi Zhang 
---
 doc/guides/nics/features.rst  |  8 
 lib/librte_ether/rte_ethdev.c | 30 ++
 lib/librte_ether/rte_ethdev.h | 11 +++
 3 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index 1b4fb979f..36ad21a1f 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -892,7 +892,15 @@ Documentation describes performance values.
 
 See ``dpdk.org/doc/perf/*``.
 
+.. _nic_features_queue_deferred_setup_capabilities:
 
+Queue deferred setup capabilities
+-
+
+Supports queue setup / release after device started.
+
+* **[provides] rte_eth_dev_info**: 
``deferred_queue_config_capa:DEV_DEFERRED_RX_QUEUE_SETUP,DEV_DEFERRED_TX_QUEUE_SETUP,DEV_DEFERRED_RX_QUEUE_RELEASE,DEV_DEFERRED_TX_QUEUE_RELEASE``.
+* **[related]  API**: ``rte_eth_dev_info_get()``.
 
 .. _nic_features_other:
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a6ce2a5ba..6c906c4df 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1425,12 +1425,6 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t 
rx_queue_id,
return -EINVAL;
}
 
-   if (dev->data->dev_started) {
-   RTE_PMD_DEBUG_TRACE(
-   "port %d must be stopped to allow configuration\n", 
port_id);
-   return -EBUSY;
-   }
-
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup, -ENOTSUP);
 
@@ -1474,10 +1468,19 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t 
rx_queue_id,
return -EINVAL;
}
 
+   if (dev->data->dev_started &&
+   !(dev_info.deferred_queue_config_capa &
+   DEV_DEFERRED_RX_QUEUE_SETUP))
+   return -EINVAL;
+
rxq = dev->data->rx_queues;
if (rxq[rx_queue_id]) {
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
-ENOTSUP);
+   if (dev->data->dev_started &&
+   !(dev_info.deferred_queue_config_capa &
+   DEV_DEFERRED_RX_QUEUE_RELEASE))
+   return -EINVAL;
(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
rxq[rx_queue_id] = NULL;
}
@@ -1573,12 +1576,6 @@ rte_eth_tx_queue_setup(uint16_t port_id, uint16_t 
tx_queue_id,
return -EINVAL;
}
 
-   if (dev->data->dev_started) {
-   RTE_PMD_DEBUG_TRACE(
-   "port %d must be stopped to allow configuration\n", 
port_id);
-   return -EBUSY;
-   }
-
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_setup, -ENOTSUP);
 
@@ -1596,10 +1593,19 @@ rte_eth_tx_queue_setup(uint16_t port_id, uint16_t 
tx_queue_id,
return -EINVAL;
}
 
+   if (dev->data->dev_started &&
+   !(dev_info.deferred_queue_config_capa &
+   DEV_DEFERRED_TX_QUEUE_SETUP))
+   return -EINVAL;
+
txq = dev->data->tx_queues;
if (txq[tx_queue_id]) {
RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
-ENOTSUP);
+   if (dev->data->dev_started &&
+   !(dev_info.deferred_queue_config_capa &
+   DEV_DEFERRED_TX_QUEUE_RELEASE))
+   return -EINVAL;
(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
txq[tx_queue_id] = NULL;
}
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 036153306..410e58c50 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -981,6 +981,15 @@ struct rte_eth_conf {
  */
 #define DEV_TX_OFFLOAD_SECURITY 0x0002
 
+#define DEV_DEFERRED_RX_QUEUE_SETUP 0x0001
+/**< Deferred setup rx queue */
+#define DEV_DEFERRED_TX_QUEUE_SETUP 0x0002
+/**< Deferred setup tx queue */
+#define DEV_DEFERRED_RX_QUEUE_RELEASE 0x0004
+/**< Deferred release rx queue */
+#define DEV_DEFERRED_TX_QUEUE_RELEASE 0x0008
+/**< Deferred release tx queue */
+
 /*
  * If new Tx offload capabilities are defined, they also must be
  * mentioned in rte_tx_offload_names in rte_ethdev.c file.
@@ -1029,6 +1038,8 @@ struct rte_eth_dev_info {
/** Configured number of rx/tx queues */
uint16_t nb_rx_queues; /**< Number of RX queues. */
uint16_t nb_tx_queues; /*

[dpdk-dev] [PATCH v2 0/4] deferred queue setup

2018-03-01 Thread Qi Zhang
According to exist implementation,rte_eth_[rx|tx]_queue_setup will
always return fail if device is already started(rte_eth_dev_start).

This can't satisfied the usage when application want to deferred setup
part of the queues while keep traffic running on those queues already
be setup.

example:
rte_eth_dev_config(nb_rxq = 2, nb_txq =2)
rte_eth_rx_queue_setup(idx = 0 ...)
rte_eth_rx_queue_setup(idx = 0 ...)
rte_eth_dev_start(...) /* [rx|tx]_burst is ready to start on queue 0 */
rte_eth_rx_queue_setup(idx=1 ...) /* fail*/

Basically this is not a general hardware limitation, because for NIC
like i40e, ixgbe, it is not necessary to stop the whole device before
configure a fresh queue or reconfigure an exist queue with no traffic
on it.

The patch let etherdev driver expose the capability flag through
rte_eth_dev_info_get when it support deferred queue configuraiton,
then base on these flag, rte_eth_[rx|tx]_queue_setup could decide
continue to setup the queue or just return fail when device already
started.

v2:
- enhance comment in rte_ethdev.h

Qi Zhang (4):
  ether: support deferred queue setup
  app/testpmd: add parameters for deferred queue setup
  app/testpmd: add command for queue setup
  net/i40e: enable deferred queue setup

 app/test-pmd/cmdline.c  | 136 
 app/test-pmd/parameters.c   |  29 ++
 app/test-pmd/testpmd.c  |   8 +-
 app/test-pmd/testpmd.h  |   2 +
 doc/guides/nics/features.rst|   8 ++
 doc/guides/testpmd_app_ug/run_app.rst   |  12 +++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   7 ++
 drivers/net/i40e/i40e_ethdev.c  |   6 ++
 drivers/net/i40e/i40e_rxtx.c|  62 -
 lib/librte_ether/rte_ethdev.c   |  30 +++---
 lib/librte_ether/rte_ethdev.h   |  11 +++
 11 files changed, 295 insertions(+), 16 deletions(-)

-- 
2.13.6



[dpdk-dev] [PATCH v2 3/4] app/testpmd: add command for queue setup

2018-03-01 Thread Qi Zhang
Add new command to setup queue:
queue setup (rx|tx) (port_id) (queue_idx) (ring_size)

rte_eth_[rx|tx]_queue_setup will be called corresponsively

Signed-off-by: Qi Zhang 
---
 app/test-pmd/cmdline.c  | 136 
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   7 ++
 2 files changed, 143 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b4522f46a..b725f644d 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -774,6 +774,9 @@ static void cmd_help_long_parsed(void *parsed_result,
"port tm hierarchy commit (port_id) (clean_on_fail)\n"
"   Commit tm hierarchy.\n\n"
 
+   "queue setup (rx|tx) (port_id) (queue_idx) 
(ring_size)\n"
+   "   setup a not started queue or re-setup a started 
queue.\n\n"
+
, list_pkt_forwarding_modes()
);
}
@@ -16030,6 +16033,138 @@ cmdline_parse_inst_t cmd_load_from_file = {
},
 };
 
+/* Queue Setup */
+
+/* Common result structure for queue setup */
+struct cmd_queue_setup_result {
+   cmdline_fixed_string_t queue;
+   cmdline_fixed_string_t setup;
+   cmdline_fixed_string_t rxtx;
+   portid_t port_id;
+   uint16_t queue_idx;
+   uint16_t ring_size;
+};
+
+/* Common CLI fields for queue setup */
+cmdline_parse_token_string_t cmd_queue_setup_queue =
+   TOKEN_STRING_INITIALIZER(struct cmd_queue_setup_result, queue, "queue");
+cmdline_parse_token_string_t cmd_queue_setup_setup =
+   TOKEN_STRING_INITIALIZER(struct cmd_queue_setup_result, setup, "setup");
+cmdline_parse_token_string_t cmd_queue_setup_rxtx =
+   TOKEN_STRING_INITIALIZER(struct cmd_queue_setup_result, rxtx, "rx#tx");
+cmdline_parse_token_num_t cmd_queue_setup_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_queue_setup_result, port_id, UINT16);
+cmdline_parse_token_num_t cmd_queue_setup_queue_idx =
+   TOKEN_NUM_INITIALIZER(struct cmd_queue_setup_result, queue_idx, UINT16);
+cmdline_parse_token_num_t cmd_queue_setup_ring_size =
+   TOKEN_NUM_INITIALIZER(struct cmd_queue_setup_result, ring_size, UINT16);
+
+static void
+cmd_queue_setup_parsed(
+   void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_queue_setup_result *res = parsed_result;
+   struct rte_port *port;
+   struct rte_mempool *mp;
+   uint8_t rx = 1;
+   int ret;
+
+   if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+   return;
+
+   if (!strcmp(res->rxtx, "tx"))
+   rx = 0;
+
+   if (rx && res->ring_size <= rx_free_thresh) {
+   printf("Invalid ring_size, must >= rx_free_thresh: %d\n",
+   rx_free_thresh);
+   return;
+   }
+
+   if (rx && res->queue_idx >= nb_rxq) {
+   printf("Invalid rx queue index, must < nb_rxq: %d\n",
+   nb_rxq);
+   return;
+   }
+
+   if (!rx && res->queue_idx >= nb_txq) {
+   printf("Invalid tx queue index, must < nb_txq: %d\n",
+   nb_txq);
+   return;
+   }
+
+   port = &ports[res->port_id];
+   if (rx) {
+   if (numa_support &&
+   (rxring_numa[res->port_id] != NUMA_NO_CONFIG)) {
+   mp = mbuf_pool_find(rxring_numa[res->port_id]);
+   if (mp == NULL) {
+   printf("Failed to setup RX queue: "
+   "No mempool allocation"
+   " on the socket %d\n",
+   rxring_numa[res->port_id]);
+   return;
+   }
+   ret = rte_eth_rx_queue_setup(res->port_id,
+res->queue_idx,
+res->ring_size,
+rxring_numa[res->port_id],
+&(port->rx_conf),
+mp);
+   } else {
+   mp = mbuf_pool_find(port->socket_id);
+   if (mp == NULL) {
+   printf("Failed to setup RX queue:"
+   "No mempool allocation"
+   " on the socket %d\n",
+   port->socket_id);
+   return;
+   }
+   ret = rte_eth_rx_queue_setup(res->port_id,
+res->queue_idx,
+res->ring_size,
+  

[dpdk-dev] [PATCH v2 2/4] app/testpmd: add parameters for deferred queue setup

2018-03-01 Thread Qi Zhang
Add two parameters:
rxq-setup: set the number of RX queues be setup before device started
txq-setup: set the number of TX queues be setup before device started.

Signed-off-by: Qi Zhang 
---
 app/test-pmd/parameters.c | 29 +
 app/test-pmd/testpmd.c|  8 ++--
 app/test-pmd/testpmd.h|  2 ++
 doc/guides/testpmd_app_ug/run_app.rst | 12 
 4 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c
index 97d22b860..497259ee7 100644
--- a/app/test-pmd/parameters.c
+++ b/app/test-pmd/parameters.c
@@ -146,8 +146,12 @@ usage(char* progname)
printf("  --rss-ip: set RSS functions to IPv4/IPv6 only .\n");
printf("  --rss-udp: set RSS functions to IPv4/IPv6 + UDP.\n");
printf("  --rxq=N: set the number of RX queues per port to N.\n");
+   printf("  --rxq-setup=N: set the number of RX queues be setup before"
+  "device start to N.\n");
printf("  --rxd=N: set the number of descriptors in RX rings to N.\n");
printf("  --txq=N: set the number of TX queues per port to N.\n");
+   printf("  --txq-setup=N: set the number of TX queues be setup before"
+  "device start to N.\n");
printf("  --txd=N: set the number of descriptors in TX rings to N.\n");
printf("  --burst=N: set the number of packets per burst to N.\n");
printf("  --mbcache=N: set the cache of mbuf memory pool to N.\n");
@@ -596,7 +600,9 @@ launch_args_parse(int argc, char** argv)
{ "rss-ip", 0, 0, 0 },
{ "rss-udp",0, 0, 0 },
{ "rxq",1, 0, 0 },
+   { "rxq-setup",  1, 0, 0 },
{ "txq",1, 0, 0 },
+   { "txq-setup",  1, 0, 0 },
{ "rxd",1, 0, 0 },
{ "txd",1, 0, 0 },
{ "burst",  1, 0, 0 },
@@ -933,6 +939,15 @@ launch_args_parse(int argc, char** argv)
  " >= 0 && <= %u\n", n,
  get_allowed_max_nb_rxq(&pid));
}
+   if (!strcmp(lgopts[opt_idx].name, "rxq-setup")) {
+   n = atoi(optarg);
+   if (n >= 0 && check_nb_rxq((queueid_t)n) == 0)
+   nb_rxq_setup = (queueid_t) n;
+   else
+   rte_exit(EXIT_FAILURE, "rxq-setup %d 
invalid - must be"
+ " >= 0 && <= %u\n", n,
+ get_allowed_max_nb_rxq(&pid));
+   }
if (!strcmp(lgopts[opt_idx].name, "txq")) {
n = atoi(optarg);
if (n >= 0 && check_nb_txq((queueid_t)n) == 0)
@@ -942,6 +957,15 @@ launch_args_parse(int argc, char** argv)
  " >= 0 && <= %u\n", n,
  get_allowed_max_nb_txq(&pid));
}
+   if (!strcmp(lgopts[opt_idx].name, "txq-setup")) {
+   n = atoi(optarg);
+   if (n >= 0 && check_nb_txq((queueid_t)n) == 0)
+   nb_txq_setup = (queueid_t) n;
+   else
+   rte_exit(EXIT_FAILURE, "txq-setup %d 
invalid - must be"
+ " >= 0 && <= %u\n", n,
+ get_allowed_max_nb_txq(&pid));
+   }
if (!nb_rxq && !nb_txq) {
rte_exit(EXIT_FAILURE, "Either rx or tx queues 
should "
"be non-zero\n");
@@ -1119,4 +1143,9 @@ launch_args_parse(int argc, char** argv)
/* Set offload configuration from command line parameters. */
rx_mode.offloads = rx_offloads;
tx_mode.offloads = tx_offloads;
+
+   if (nb_rxq_setup > nb_rxq)
+   nb_rxq_setup = nb_rxq;
+   if (nb_txq_setup > nb_txq)
+   nb_txq_setup = nb_txq;
 }
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 46dc22c94..790e7359c 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -207,6 +207,10 @@ uint8_t dcb_test = 0;
  */
 queueid_t nb_rxq = 1; /**< Number of RX queues per port. */
 queueid_t nb_txq = 1; /**< Number of TX queues per port. */
+queueid_t nb_rxq_setup = MAX_QUEUE_ID;
+/**< Number of RX queues per port start when dev_start. */
+queueid_t nb_txq_setup = MAX_QUEUE

[dpdk-dev] [PATCH v2 4/4] net/i40e: enable deferred queue setup

2018-03-01 Thread Qi Zhang
Expose the deferred queue configuration capability and enhance
i40e_dev_[rx|tx]_queue_[setup|release] to handle the situation when
device already started.

Signed-off-by: Qi Zhang 
---
 drivers/net/i40e/i40e_ethdev.c |  6 
 drivers/net/i40e/i40e_rxtx.c   | 62 --
 2 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 06b0f03a1..843a0c42a 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -3195,6 +3195,12 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct 
rte_eth_dev_info *dev_info)
DEV_TX_OFFLOAD_GRE_TNL_TSO |
DEV_TX_OFFLOAD_IPIP_TNL_TSO |
DEV_TX_OFFLOAD_GENEVE_TNL_TSO;
+   dev_info->deferred_queue_config_capa =
+   DEV_DEFERRED_RX_QUEUE_SETUP |
+   DEV_DEFERRED_TX_QUEUE_SETUP |
+   DEV_DEFERRED_RX_QUEUE_RELEASE |
+   DEV_DEFERRED_TX_QUEUE_RELEASE;
+
dev_info->hash_key_size = (I40E_PFQF_HKEY_MAX_INDEX + 1) *
sizeof(uint32_t);
dev_info->reta_size = pf->hash_lut_size;
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 1217e5a61..e5f532cf7 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1712,6 +1712,7 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev *dev,
uint16_t len, i;
uint16_t reg_idx, base, bsf, tc_mapping;
int q_offset, use_def_burst_func = 1;
+   int ret = 0;
 
if (hw->mac.type == I40E_MAC_VF || hw->mac.type == I40E_MAC_X722_VF) {
vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
@@ -1841,6 +1842,25 @@ i40e_dev_rx_queue_setup(struct rte_eth_dev *dev,
rxq->dcb_tc = i;
}
 
+   if (dev->data->dev_started) {
+   ret = i40e_rx_queue_init(rxq);
+   if (ret != I40E_SUCCESS) {
+   PMD_DRV_LOG(ERR,
+   "Failed to do RX queue initialization");
+   return ret;
+   }
+   if (ad->rx_vec_allowed)
+   i40e_rxq_vec_setup(rxq);
+   if (!rxq->rx_deferred_start) {
+   ret = i40e_dev_rx_queue_start(dev, queue_idx);
+   if (ret != I40E_SUCCESS) {
+   PMD_DRV_LOG(ERR,
+   "Failed to start RX queue");
+   return ret;
+   }
+   }
+   }
+
return 0;
 }
 
@@ -1848,13 +1868,21 @@ void
 i40e_dev_rx_queue_release(void *rxq)
 {
struct i40e_rx_queue *q = (struct i40e_rx_queue *)rxq;
+   struct rte_eth_dev *dev = &rte_eth_devices[q->port_id];
 
if (!q) {
PMD_DRV_LOG(DEBUG, "Pointer to rxq is NULL");
return;
}
 
-   i40e_rx_queue_release_mbufs(q);
+   if (dev->data->dev_started) {
+   if (dev->data->rx_queue_state[q->queue_id] ==
+   RTE_ETH_QUEUE_STATE_STARTED)
+   i40e_dev_rx_queue_stop(dev, q->queue_id);
+   } else {
+   i40e_rx_queue_release_mbufs(q);
+   }
+
rte_free(q->sw_ring);
rte_free(q);
 }
@@ -1980,6 +2008,8 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
const struct rte_eth_txconf *tx_conf)
 {
struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct i40e_adapter *ad =
+   I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private);
struct i40e_vsi *vsi;
struct i40e_pf *pf = NULL;
struct i40e_vf *vf = NULL;
@@ -1989,6 +2019,7 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
uint16_t tx_rs_thresh, tx_free_thresh;
uint16_t reg_idx, i, base, bsf, tc_mapping;
int q_offset;
+   int ret = 0;
 
if (hw->mac.type == I40E_MAC_VF || hw->mac.type == I40E_MAC_X722_VF) {
vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
@@ -2162,6 +2193,25 @@ i40e_dev_tx_queue_setup(struct rte_eth_dev *dev,
txq->dcb_tc = i;
}
 
+   if (dev->data->dev_started) {
+   ret = i40e_tx_queue_init(txq);
+   if (ret != I40E_SUCCESS) {
+   PMD_DRV_LOG(ERR,
+   "Failed to do TX queue initialization");
+   return ret;
+   }
+   if (ad->tx_vec_allowed)
+   i40e_txq_vec_setup(txq);
+   if (!txq->tx_deferred_start) {
+   ret = i40e_dev_tx_queue_start(dev, queue_idx);
+   if (ret != I40E_SUCCESS) {
+   PMD_DRV_LOG(ERR,
+   "Failed to start TX queue");
+   retur

[dpdk-dev] [PATCH] net/fm10k: convert to new Rx/Tx offloads API

2018-03-01 Thread Wei Dai
Ethdev Rx offloads API has changed since:
commit ce17eddefc20 ("ethdev: introduce Rx queue offloads API")
Ethdev Tx offloads API has changed since:
commit cba7f53b717d ("ethdev: introduce Tx queue offloads API")
This commit support the new Rx and Tx offloads API.

Signed-off-by: Wei Dai 
---
 drivers/net/fm10k/fm10k.h  |  7 +++
 drivers/net/fm10k/fm10k_ethdev.c   | 33 -
 drivers/net/fm10k/fm10k_rxtx_vec.c |  6 +++---
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index 30dad3e..57bd533 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -108,6 +108,11 @@
 
 #define FM10K_SIMPLE_TX_FLAG ((uint32_t)ETH_TXQ_FLAGS_NOMULTSEGS | \
ETH_TXQ_FLAGS_NOOFFLOADS)
+#define FM10K_SIMPLE_TX_OFFLOADS ((uint64_t)(DEV_TX_OFFLOAD_MULTI_SEGS  | \
+DEV_TX_OFFLOAD_VLAN_INSERT | \
+DEV_TX_OFFLOAD_SCTP_CKSUM  | \
+DEV_TX_OFFLOAD_UDP_CKSUM   | \
+DEV_TX_OFFLOAD_TCP_CKSUM))
 
 struct fm10k_macvlan_filter_info {
uint16_t vlan_num;   /* Total VLAN number */
@@ -180,6 +185,7 @@ struct fm10k_rx_queue {
uint8_t drop_en;
uint8_t rx_deferred_start; /* don't start this queue in dev start. */
uint16_t rx_ftag_en; /* indicates FTAG RX supported */
+   uint64_t offloads; /* offloads of DEV_RX_OFFLOAD_* */
 };
 
 /*
@@ -212,6 +218,7 @@ struct fm10k_tx_queue {
uint16_t next_dd; /* Next pos to check DD flag */
volatile uint32_t *tail_ptr;
uint32_t txq_flags; /* Holds flags for this TXq */
+   uint64_t offloads; /* Offloads of DEV_TX_OFFLOAD_* */
uint16_t nb_desc;
uint16_t port_id;
uint8_t tx_deferred_start; /** don't start this queue in dev start. */
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index 9423761..5105874 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -444,7 +444,7 @@ fm10k_dev_configure(struct rte_eth_dev *dev)
 
PMD_INIT_FUNC_TRACE();
 
-   if (dev->data->dev_conf.rxmode.hw_strip_crc == 0)
+   if (dev->data->dev_conf.rxmode.offloads & DEV_RX_OFFLOAD_CRC_STRIP)
PMD_INIT_LOG(WARNING, "fm10k always strip CRC");
/* multipe queue mode checking */
ret  = fm10k_check_mq_mode(dev);
@@ -454,6 +454,8 @@ fm10k_dev_configure(struct rte_eth_dev *dev)
return ret;
}
 
+   dev->data->scattered_rx = 0;
+
return 0;
 }
 
@@ -756,7 +758,7 @@ fm10k_dev_rx_init(struct rte_eth_dev *dev)
/* It adds dual VLAN length for supporting dual VLAN */
if ((dev->data->dev_conf.rxmode.max_rx_pkt_len +
2 * FM10K_VLAN_TAG_SIZE) > buf_size ||
-   dev->data->dev_conf.rxmode.enable_scatter) {
+   rxq->offloads & DEV_RX_OFFLOAD_SCATTER) {
uint32_t reg;
dev->data->scattered_rx = 1;
reg = FM10K_READ_REG(hw, FM10K_SRRCTL(i));
@@ -1389,11 +1391,17 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
dev_info->vmdq_queue_base= 0;
dev_info->max_vmdq_pools = ETH_32_POOLS;
dev_info->vmdq_queue_num = FM10K_MAX_QUEUES_PF;
+   dev_info->rx_queue_offload_capa = DEV_RX_OFFLOAD_SCATTER;
dev_info->rx_offload_capa =
-   DEV_RX_OFFLOAD_VLAN_STRIP |
-   DEV_RX_OFFLOAD_IPV4_CKSUM |
-   DEV_RX_OFFLOAD_UDP_CKSUM  |
-   DEV_RX_OFFLOAD_TCP_CKSUM;
+   DEV_RX_OFFLOAD_VLAN_STRIP  |
+   DEV_RX_OFFLOAD_VLAN_FILTER |
+   DEV_RX_OFFLOAD_IPV4_CKSUM  |
+   DEV_RX_OFFLOAD_UDP_CKSUM   |
+   DEV_RX_OFFLOAD_TCP_CKSUM   |
+   DEV_RX_OFFLOAD_JUMBO_FRAME |
+   DEV_RX_OFFLOAD_CRC_STRIP   |
+   DEV_RX_OFFLOAD_SCATTER;
+   dev_info->tx_queue_offload_capa = 0;
dev_info->tx_offload_capa =
DEV_TX_OFFLOAD_VLAN_INSERT |
DEV_TX_OFFLOAD_IPV4_CKSUM  |
@@ -1412,6 +1420,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
},
.rx_free_thresh = FM10K_RX_FREE_THRESH_DEFAULT(0),
.rx_drop_en = 0,
+   .offloads = 0,
};
 
dev_info->default_txconf = (struct rte_eth_txconf) {
@@ -1423,6 +1432,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
.tx_free_thresh = FM10K_TX_FREE_THRESH_DEFAULT(0),
.tx_rs_thresh = FM10K_TX_RS_THRESH_DEFAULT(0),
.txq_flags = FM10K_SIMPLE_TX_FLAG,
+   .offloads = 0,
};
 
dev_info->rx_desc_lim = (struct rte_eth_desc_lim) {
@@ -1571,19 +1581,22 @@ static int
 fm10k_vlan_offloa

[dpdk-dev] librte_power w/ intel_pstate cpufreq governor

2018-03-01 Thread longtb5
Hi everybody,

I know this thread was from over 2 years ago but I ran into the same problem
with l3fwd-power today.
Any updates on this?

-BL


Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor

2018-03-01 Thread longtb5
Forgot to link the original thread.

http://dpdk.org/ml/archives/dev/2016-January/030930.html

-BL

> -Original Message-
> From: long...@viettel.com.vn [mailto:long...@viettel.com.vn]
> Sent: Friday, March 2, 2018 2:19 PM
> To: dev@dpdk.org
> Cc: david.h...@intel.com; mh...@mhcomputing.net; helin.zh...@intel.com;
> long...@viettel.com.vn
> Subject: librte_power w/ intel_pstate cpufreq governor
> 
> Hi everybody,
> 
> I know this thread was from over 2 years ago but I ran into the same
problem
> with l3fwd-power today.
> 
> Any updates on this?
> 
> -BL




Re: [dpdk-dev] virtio with 2MB hugepages - bringing back single file segments

2018-03-01 Thread Maxime Coquelin



On 03/01/2018 11:40 PM, Stojaczyk, DariuszX wrote:

Hi,

I'm trying to make a vhost-user initiator built upon DPDK work with 2MB 
hugepages. In the initiator we have to share all memory with the host process, 
so it
can perform DMA. DPDK currently enforces having one descriptor per hugepage and there's 
an artificial limit of shared descriptors in DPDK vhost-user implementation (currently 
8). Because of that, all DPDK vhost-user initiators are practically limited to 1GB 
hugepages at the moment. We can always increase the artificial descriptor limit, but then 
we're limited by sendmsg() itself, which on Linux accepts no more than 253 descriptors. 
However, could we increase the vhost-user implementation limit to - say - 128, and bring 
back "single file segments" [1]?


If you do something like this, you'll have first to update the
vhost-user spec, which should I think include a new protocol
feature bit.

Also, you will have to consider improving the translation functions
with a better search algorithm, else you'll have very poor performance.



Could I send a patch series that does this? The single file segments code would 
go through a cleanup - at least making it available via a runtime option rather 
than #ifdefs.

I know there's an ongoing rework of the memory allocator in DPDK [2] and it 
includes a similar single file segments functionality. However, it will 
probably take quite some time before merged and even then, the new 
functionality would only be available in the *new* allocator. The old one is 
kept unchanged. It could use single file segments as well.

Regards,
D.

[1] http://dpdk.org/dev/patchwork/patch/16042/
[2] http://dpdk.org/ml/archives/dev/2017-December/084302.html