[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-14 Thread Tan, Jianfeng
Hi Amit,

On 1/13/2016 11:00 PM, Amit Tomer wrote:
> Hello,
>
>> You can use below patch for l2fwd to send out an arp packet when it gets
>> started.
> I tried to send out arp packet using this patch but buffer allocation
> for arp packets itself gets failed:
>
>   m = rte_pktmbuf_alloc(mp);
>
> Return a NULL Value.

Can you send out how you start this l2fwd program?

Thanks,
Jianfeng


>
> Thanks,
> Amit.



[dpdk-dev] [PATCH v2 0/6] Support VxLAN & NVGRE checksum off-load on X550

2016-01-14 Thread Wenzhuo Lu
This patch set add the VxLAN & NVGRE checksum off-load support.
Both RX and TX checksum off-load can be used for VxLAN & NVGRE.
And the VxLAN port can be set, it's implemented in this patch
set either.

Wenzhuo Lu (6):
  lib/librte_ether: change function name of tunnel port config
  i40e: rename the tunnel port config functions
  ixgbe: support UDP tunnel port config
  ixgbe: support VxLAN &  NVGRE RX checksum off-load
  ixgbe: support VxLAN &  NVGRE TX checksum off-load
  doc: update release note for VxLAN & NVGRE checksum off-load support

 app/test-pmd/cmdline.c |  6 ++-
 doc/guides/rel_notes/release_2_3.rst   |  8 +++
 drivers/net/i40e/i40e_ethdev.c | 22 
 drivers/net/ixgbe/ixgbe_ethdev.c   | 95 ++
 drivers/net/ixgbe/ixgbe_rxtx.c | 63 ++
 drivers/net/ixgbe/ixgbe_rxtx.h |  6 ++-
 examples/tep_termination/vxlan_setup.c |  2 +-
 lib/librte_ether/rte_ethdev.c  | 45 
 lib/librte_ether/rte_ethdev.h  | 18 +++
 lib/librte_mbuf/rte_mbuf.c |  1 +
 lib/librte_mbuf/rte_mbuf.h |  3 ++
 11 files changed, 244 insertions(+), 25 deletions(-)

-- 
1.9.3



[dpdk-dev] [PATCH v2 1/6] lib/librte_ether: change function name of tunnel port config

2016-01-14 Thread Wenzhuo Lu
The names of function for tunnel port configuration are not
accurate. They're tunnel_add/del, better change them to
tunnel_port_add/del.
As it may be an ABI change if change the names directly, the
new functions are added but not remove the old ones. The old
ones will be removed in the next release after an ABI change
announcement.

Signed-off-by: Wenzhuo Lu 
---
 app/test-pmd/cmdline.c |  6 +++--
 examples/tep_termination/vxlan_setup.c |  2 +-
 lib/librte_ether/rte_ethdev.c  | 45 ++
 lib/librte_ether/rte_ethdev.h  | 18 ++
 4 files changed, 68 insertions(+), 3 deletions(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 73298c9..4e71e90 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -6780,9 +6780,11 @@ cmd_tunnel_udp_config_parsed(void *parsed_result,
tunnel_udp.prot_type = RTE_TUNNEL_TYPE_VXLAN;

if (!strcmp(res->what, "add"))
-   ret = rte_eth_dev_udp_tunnel_add(res->port_id, &tunnel_udp);
+   ret = rte_eth_dev_udp_tunnel_port_add(res->port_id,
+ &tunnel_udp);
else
-   ret = rte_eth_dev_udp_tunnel_delete(res->port_id, &tunnel_udp);
+   ret = rte_eth_dev_udp_tunnel_port_delete(res->port_id,
+&tunnel_udp);

if (ret < 0)
printf("udp tunneling add error: (%s)\n", strerror(-ret));
diff --git a/examples/tep_termination/vxlan_setup.c 
b/examples/tep_termination/vxlan_setup.c
index 51ad133..8836603 100644
--- a/examples/tep_termination/vxlan_setup.c
+++ b/examples/tep_termination/vxlan_setup.c
@@ -191,7 +191,7 @@ vxlan_port_init(uint8_t port, struct rte_mempool *mbuf_pool)
/* Configure UDP port for UDP tunneling */
tunnel_udp.udp_port = udp_port;
tunnel_udp.prot_type = RTE_TUNNEL_TYPE_VXLAN;
-   retval = rte_eth_dev_udp_tunnel_add(port, &tunnel_udp);
+   retval = rte_eth_dev_udp_tunnel_port_add(port, &tunnel_udp);
if (retval < 0)
return retval;
rte_eth_macaddr_get(port, &ports_eth_addr[port]);
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ed971b4..74428f4 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1987,6 +1987,28 @@ rte_eth_dev_udp_tunnel_add(uint8_t port_id,
 }

 int
+rte_eth_dev_udp_tunnel_port_add(uint8_t port_id,
+   struct rte_eth_udp_tunnel *udp_tunnel)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   if (udp_tunnel == NULL) {
+   RTE_PMD_DEBUG_TRACE("Invalid udp_tunnel parameter\n");
+   return -EINVAL;
+   }
+
+   if (udp_tunnel->prot_type >= RTE_TUNNEL_TYPE_MAX) {
+   RTE_PMD_DEBUG_TRACE("Invalid tunnel type\n");
+   return -EINVAL;
+   }
+
+   dev = &rte_eth_devices[port_id];
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->udp_tunnel_port_add, -ENOTSUP);
+   return (*dev->dev_ops->udp_tunnel_port_add)(dev, udp_tunnel);
+}
+
+int
 rte_eth_dev_udp_tunnel_delete(uint8_t port_id,
  struct rte_eth_udp_tunnel *udp_tunnel)
 {
@@ -2010,6 +2032,29 @@ rte_eth_dev_udp_tunnel_delete(uint8_t port_id,
 }

 int
+rte_eth_dev_udp_tunnel_port_delete(uint8_t port_id,
+  struct rte_eth_udp_tunnel *udp_tunnel)
+{
+   struct rte_eth_dev *dev;
+
+   RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+   dev = &rte_eth_devices[port_id];
+
+   if (udp_tunnel == NULL) {
+   RTE_PMD_DEBUG_TRACE("Invalid udp_tunnel parameter\n");
+   return -EINVAL;
+   }
+
+   if (udp_tunnel->prot_type >= RTE_TUNNEL_TYPE_MAX) {
+   RTE_PMD_DEBUG_TRACE("Invalid tunnel type\n");
+   return -EINVAL;
+   }
+
+   RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->udp_tunnel_port_del, -ENOTSUP);
+   return (*dev->dev_ops->udp_tunnel_port_del)(dev, udp_tunnel);
+}
+
+int
 rte_eth_led_on(uint8_t port_id)
 {
struct rte_eth_dev *dev;
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index bada8ad..2e064f4 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1261,6 +1261,14 @@ typedef int (*eth_set_eeprom_t)(struct rte_eth_dev *dev,
struct rte_dev_eeprom_info *info);
 /**< @internal Program eeprom data  */

+typedef int (*eth_udp_tunnel_port_add_t)(struct rte_eth_dev *dev,
+struct rte_eth_udp_tunnel *tunnel_udp);
+/**< @internal Add tunneling UDP port */
+
+typedef int (*eth_udp_tunnel_port_del_t)(struct rte_eth_dev *dev,
+struct rte_eth_udp_tunnel *tunnel_udp);
+/**< @internal Delete tunneling UDP port */
+
 #ifdef RTE_NIC_BYPASS

 enum {
@@ -1443

[dpdk-dev] [PATCH v2 2/6] i40e: rename the tunnel port config functions

2016-01-14 Thread Wenzhuo Lu
As the names of tunnel port config functions are not
accurate, change them from tunnel_add/del to
tunnel_port_add/del.
And support both the old and new rte ops.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/i40e/i40e_ethdev.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index bf6220d..b0335f5 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -369,10 +369,10 @@ static int i40e_dev_rss_hash_update(struct rte_eth_dev 
*dev,
struct rte_eth_rss_conf *rss_conf);
 static int i40e_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
  struct rte_eth_rss_conf *rss_conf);
-static int i40e_dev_udp_tunnel_add(struct rte_eth_dev *dev,
-   struct rte_eth_udp_tunnel *udp_tunnel);
-static int i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev,
-   struct rte_eth_udp_tunnel *udp_tunnel);
+static int i40e_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
+   struct rte_eth_udp_tunnel *udp_tunnel);
+static int i40e_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
+   struct rte_eth_udp_tunnel *udp_tunnel);
 static int i40e_ethertype_filter_set(struct i40e_pf *pf,
struct rte_eth_ethertype_filter *filter,
bool add);
@@ -467,8 +467,10 @@ static const struct eth_dev_ops i40e_eth_dev_ops = {
.reta_query   = i40e_dev_rss_reta_query,
.rss_hash_update  = i40e_dev_rss_hash_update,
.rss_hash_conf_get= i40e_dev_rss_hash_conf_get,
-   .udp_tunnel_add   = i40e_dev_udp_tunnel_add,
-   .udp_tunnel_del   = i40e_dev_udp_tunnel_del,
+   .udp_tunnel_add   = i40e_dev_udp_tunnel_port_add,
+   .udp_tunnel_del   = i40e_dev_udp_tunnel_port_del,
+   .udp_tunnel_port_add  = i40e_dev_udp_tunnel_port_add,
+   .udp_tunnel_port_del  = i40e_dev_udp_tunnel_port_del,
.filter_ctrl  = i40e_dev_filter_ctrl,
.rxq_info_get = i40e_rxq_info_get,
.txq_info_get = i40e_txq_info_get,
@@ -5976,8 +5978,8 @@ i40e_del_vxlan_port(struct i40e_pf *pf, uint16_t port)

 /* Add UDP tunneling port */
 static int
-i40e_dev_udp_tunnel_add(struct rte_eth_dev *dev,
-   struct rte_eth_udp_tunnel *udp_tunnel)
+i40e_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
+struct rte_eth_udp_tunnel *udp_tunnel)
 {
int ret = 0;
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
@@ -6007,8 +6009,8 @@ i40e_dev_udp_tunnel_add(struct rte_eth_dev *dev,

 /* Remove UDP tunneling port */
 static int
-i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev,
-   struct rte_eth_udp_tunnel *udp_tunnel)
+i40e_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
+struct rte_eth_udp_tunnel *udp_tunnel)
 {
int ret = 0;
struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
-- 
1.9.3



[dpdk-dev] [PATCH v2 4/6] ixgbe: support VxLAN & NVGRE RX checksum off-load

2016-01-14 Thread Wenzhuo Lu
X550 will do VxLAN & NVGRE RX checksum off-load automatically.
This patch exposes the result of the checksum off-load.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/ixgbe_rxtx.c | 11 ++-
 lib/librte_mbuf/rte_mbuf.c |  1 +
 lib/librte_mbuf/rte_mbuf.h |  1 +
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 52a263c..512ac3a 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1003,6 +1003,8 @@ rx_desc_status_to_pkt_flags(uint32_t rx_status)
 static inline uint64_t
 rx_desc_error_to_pkt_flags(uint32_t rx_status)
 {
+   uint64_t pkt_flags;
+
/*
 * Bit 31: IPE, IPv4 checksum error
 * Bit 30: L4I, L4I integrity error
@@ -1011,8 +1013,15 @@ rx_desc_error_to_pkt_flags(uint32_t rx_status)
0,  PKT_RX_L4_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD,
PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD
};
-   return error_to_pkt_flags_map[(rx_status >>
+   pkt_flags = error_to_pkt_flags_map[(rx_status >>
IXGBE_RXDADV_ERR_CKSUM_BIT) & IXGBE_RXDADV_ERR_CKSUM_MSK];
+
+   if ((rx_status & IXGBE_RXD_STAT_OUTERIPCS) &&
+   (rx_status & IXGBE_RXDADV_ERR_OUTERIPER)) {
+   pkt_flags |= PKT_RX_OUTER_IP_CKSUM_BAD;
+   }
+
+   return pkt_flags;
 }

 /*
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index c18b438..5d4af39 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -260,6 +260,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask)
/* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */
case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP";
case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST";
+   case PKT_RX_OUTER_IP_CKSUM_BAD: return "PKT_RX_OUTER_IP_CKSUM_BAD";
default: return NULL;
}
 }
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f234ac9..5ad5e59 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -98,6 +98,7 @@ extern "C" {
 #define PKT_RX_FDIR_ID   (1ULL << 13) /**< FD id reported if FDIR match. */
 #define PKT_RX_FDIR_FLX  (1ULL << 14) /**< Flexible bytes reported if FDIR 
match. */
 #define PKT_RX_QINQ_PKT  (1ULL << 15)  /**< RX packet with double VLAN 
stripped. */
+#define PKT_RX_OUTER_IP_CKSUM_BAD (1ULL << 16)  /**< Outer IP cksum of RX pkt. 
is not OK. */
 /* add new RX flags here */

 /* add new TX flags here */
-- 
1.9.3



[dpdk-dev] [PATCH v2 3/6] ixgbe: support UDP tunnel port config

2016-01-14 Thread Wenzhuo Lu
Add UDP tunnel port add/del support on ixgbe. Now only
support VxLAN port configuration.
Although the VxLAN port has a default value 4789, it can be
changed. We support VxLAN port configuration to meet the
change.
Note, the default value of VxLAN port in ixgbe NICs is 0. So
please set it when using VxLAN off-load.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 95 
 1 file changed, 95 insertions(+)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4c4c6df..c04edde 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -337,6 +337,10 @@ static int ixgbe_timesync_read_time(struct rte_eth_dev 
*dev,
   struct timespec *timestamp);
 static int ixgbe_timesync_write_time(struct rte_eth_dev *dev,
   const struct timespec *timestamp);
+static int ixgbe_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
+struct rte_eth_udp_tunnel *udp_tunnel);
+static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
+struct rte_eth_udp_tunnel *udp_tunnel);

 /*
  * Define VF Stats MACRO for Non "cleared on read" register
@@ -495,6 +499,10 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = {
.timesync_adjust_time = ixgbe_timesync_adjust_time,
.timesync_read_time   = ixgbe_timesync_read_time,
.timesync_write_time  = ixgbe_timesync_write_time,
+   .udp_tunnel_add   = ixgbe_dev_udp_tunnel_port_add,
+   .udp_tunnel_del   = ixgbe_dev_udp_tunnel_port_del,
+   .udp_tunnel_port_add  = ixgbe_dev_udp_tunnel_port_add,
+   .udp_tunnel_port_del  = ixgbe_dev_udp_tunnel_port_del,
 };

 /*
@@ -6191,6 +6199,93 @@ ixgbe_dev_get_dcb_info(struct rte_eth_dev *dev,
return 0;
 }

+#define DEFAULT_VXLAN_PORT 4789
+
+/* on x550, there's only one register for VxLAN UDP port.
+ * So, we cannot add or del the port. We only update it.
+ */
+static int
+ixgbe_update_vxlan_port(struct ixgbe_hw *hw,
+   uint16_t port)
+{
+   IXGBE_WRITE_REG(hw, IXGBE_VXLANCTRL, port);
+   IXGBE_WRITE_FLUSH(hw);
+
+   return 0;
+}
+
+/* Add UDP tunneling port */
+static int
+ixgbe_dev_udp_tunnel_port_add(struct rte_eth_dev *dev,
+ struct rte_eth_udp_tunnel *udp_tunnel)
+{
+   int ret = 0;
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   if (hw->mac.type != ixgbe_mac_X550 &&
+   hw->mac.type != ixgbe_mac_X550EM_x) {
+   return -ENOTSUP;
+   }
+
+   if (udp_tunnel == NULL)
+   return -EINVAL;
+
+   switch (udp_tunnel->prot_type) {
+   case RTE_TUNNEL_TYPE_VXLAN:
+   /* cannot add a port, update the port value */
+   ret = ixgbe_update_vxlan_port(hw, udp_tunnel->udp_port);
+   break;
+
+   case RTE_TUNNEL_TYPE_GENEVE:
+   case RTE_TUNNEL_TYPE_TEREDO:
+   PMD_DRV_LOG(ERR, "Tunnel type is not supported now.");
+   ret = -1;
+   break;
+
+   default:
+   PMD_DRV_LOG(ERR, "Invalid tunnel type");
+   ret = -1;
+   break;
+   }
+
+   return ret;
+}
+
+/* Remove UDP tunneling port */
+static int
+ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev,
+ struct rte_eth_udp_tunnel *udp_tunnel)
+{
+   int ret = 0;
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+   if (hw->mac.type != ixgbe_mac_X550 &&
+   hw->mac.type != ixgbe_mac_X550EM_x) {
+   return -ENOTSUP;
+   }
+
+   if (udp_tunnel == NULL)
+   return -EINVAL;
+
+   switch (udp_tunnel->prot_type) {
+   case RTE_TUNNEL_TYPE_VXLAN:
+   /* cannot del the port, reset it to default */
+   ret = ixgbe_update_vxlan_port(hw, DEFAULT_VXLAN_PORT);
+   break;
+   case RTE_TUNNEL_TYPE_GENEVE:
+   case RTE_TUNNEL_TYPE_TEREDO:
+   PMD_DRV_LOG(ERR, "Tunnel type is not supported now.");
+   ret = -1;
+   break;
+   default:
+   PMD_DRV_LOG(ERR, "Invalid tunnel type");
+   ret = -1;
+   break;
+   }
+
+   return ret;
+}
+
 static struct rte_driver rte_ixgbe_driver = {
.type = PMD_PDEV,
.init = rte_ixgbe_pmd_init,
-- 
1.9.3



[dpdk-dev] [PATCH v2 5/6] ixgbe: support VxLAN & NVGRE TX checksum off-load

2016-01-14 Thread Wenzhuo Lu
The patch add VxLAN & NVGRE TX checksum off-load. When the flag of
outer IP header checksum offload is set, we'll set the context
descriptor to enable this checksum off-load.

Signed-off-by: Wenzhuo Lu 
---
 drivers/net/ixgbe/ixgbe_rxtx.c | 52 ++
 drivers/net/ixgbe/ixgbe_rxtx.h |  6 -
 lib/librte_mbuf/rte_mbuf.h |  2 ++
 3 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 512ac3a..fea2495 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -85,7 +85,8 @@
PKT_TX_VLAN_PKT |\
PKT_TX_IP_CKSUM |\
PKT_TX_L4_MASK | \
-   PKT_TX_TCP_SEG)
+   PKT_TX_TCP_SEG | \
+   PKT_TX_OUTER_IP_CKSUM)

 static inline struct rte_mbuf *
 rte_rxmbuf_alloc(struct rte_mempool *mp)
@@ -364,9 +365,11 @@ ixgbe_set_xmit_ctx(struct ixgbe_tx_queue *txq,
uint32_t ctx_idx;
uint32_t vlan_macip_lens;
union ixgbe_tx_offload tx_offload_mask;
+   uint32_t seqnum_seed = 0;

ctx_idx = txq->ctx_curr;
-   tx_offload_mask.data = 0;
+   tx_offload_mask.data[0] = 0;
+   tx_offload_mask.data[1] = 0;
type_tucmd_mlhl = 0;

/* Specify which HW CTX to upload. */
@@ -430,9 +433,20 @@ ixgbe_set_xmit_ctx(struct ixgbe_tx_queue *txq,
}
}

+   if (ol_flags & PKT_TX_OUTER_IP_CKSUM) {
+   tx_offload_mask.outer_l3_len |= ~0;
+   tx_offload_mask.outer_l2_len |= ~0;
+   seqnum_seed |= tx_offload.outer_l3_len
+  << IXGBE_ADVTXD_OUTER_IPLEN;
+   seqnum_seed |= tx_offload.outer_l2_len
+  << IXGBE_ADVTXD_TUNNEL_LEN;
+   }
+
txq->ctx_cache[ctx_idx].flags = ol_flags;
-   txq->ctx_cache[ctx_idx].tx_offload.data  =
-   tx_offload_mask.data & tx_offload.data;
+   txq->ctx_cache[ctx_idx].tx_offload.data[0]  =
+   tx_offload_mask.data[0] & tx_offload.data[0];
+   txq->ctx_cache[ctx_idx].tx_offload.data[1]  =
+   tx_offload_mask.data[1] & tx_offload.data[1];
txq->ctx_cache[ctx_idx].tx_offload_mask= tx_offload_mask;

ctx_txd->type_tucmd_mlhl = rte_cpu_to_le_32(type_tucmd_mlhl);
@@ -441,7 +455,7 @@ ixgbe_set_xmit_ctx(struct ixgbe_tx_queue *txq,
vlan_macip_lens |= ((uint32_t)tx_offload.vlan_tci << 
IXGBE_ADVTXD_VLAN_SHIFT);
ctx_txd->vlan_macip_lens = rte_cpu_to_le_32(vlan_macip_lens);
ctx_txd->mss_l4len_idx   = rte_cpu_to_le_32(mss_l4len_idx);
-   ctx_txd->seqnum_seed = 0;
+   ctx_txd->seqnum_seed = seqnum_seed;
 }

 /*
@@ -454,16 +468,24 @@ what_advctx_update(struct ixgbe_tx_queue *txq, uint64_t 
flags,
 {
/* If match with the current used context */
if (likely((txq->ctx_cache[txq->ctx_curr].flags == flags) &&
-   (txq->ctx_cache[txq->ctx_curr].tx_offload.data ==
-   (txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data & 
tx_offload.data {
+   (txq->ctx_cache[txq->ctx_curr].tx_offload.data[0] ==
+   (txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data[0]
+& tx_offload.data[0])) &&
+   (txq->ctx_cache[txq->ctx_curr].tx_offload.data[1] ==
+   (txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data[1]
+& tx_offload.data[1] {
return txq->ctx_curr;
}

/* What if match with the next context  */
txq->ctx_curr ^= 1;
if (likely((txq->ctx_cache[txq->ctx_curr].flags == flags) &&
-   (txq->ctx_cache[txq->ctx_curr].tx_offload.data ==
-   (txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data & 
tx_offload.data {
+   (txq->ctx_cache[txq->ctx_curr].tx_offload.data[0] ==
+   (txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data[0]
+& tx_offload.data[0])) &&
+   (txq->ctx_cache[txq->ctx_curr].tx_offload.data[1] ==
+   (txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data[1]
+& tx_offload.data[1] {
return txq->ctx_curr;
}

@@ -492,6 +514,12 @@ tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
cmdtype |= IXGBE_ADVTXD_DCMD_VLE;
if (ol_flags & PKT_TX_TCP_SEG)
cmdtype |= IXGBE_ADVTXD_DCMD_TSE;
+   if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
+   cmdtype |= (1 << IXGBE_ADVTXD_OUTERIPCS_SHIFT);
+   if (ol_flags & PKT_TX_VXLAN_PKT)
+   cmdtype &= ~(1 << IXGBE_ADVTXD_TUNNEL_TYPE_NVGRE);
+   else
+   cmdtype |= (1 << IXGBE_ADVTXD_TUNNEL_TYPE_NVGRE);
return cmdtype;
 }

@@ -588,8 +616,10 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint64_t tx_ol_req;
uint32_t ctx = 0;

[dpdk-dev] [PATCH v2 6/6] doc: update release note for VxLAN & NVGRE checksum off-load support

2016-01-14 Thread Wenzhuo Lu
Signed-off-by: Wenzhuo Lu 
---
 doc/guides/rel_notes/release_2_3.rst | 8 
 1 file changed, 8 insertions(+)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..5dce7fb 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,14 @@ DPDK Release 2.3
 New Features
 

+* **Added support for VxLAN & NVGRE checksum off-load on X550.**
+
+  * Added support for VxLAN & NVGRE RX/TX checksum off-load on
+X550. RX/TX checksum off-load is provided on both inner and
+outer IP header and TCP header.
+  * Added functions to support VxLAN port configuration. The
+default VxLAN port number is 4789 but this can be updated
+programmatically.

 Resolved Issues
 ---
-- 
1.9.3



[dpdk-dev] Getting error while running DPDK test app on X-Gene1

2016-01-14 Thread Qiu, Michael
Could you show what's  exists in

/sys/bus/pci/devices/:01:00.0/


Thanks, Michael


On 1/13/2016 6:23 PM, Ankit Jindal wrote:
> Hi,
>
> We are trying to run dpdk on our arm64 based SOC having Intel 10G
> ixgbe PCIe card plugged. While running any test app, we are getting
> following error.
>
> EAL: PCI device :01:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:10fb rte_ixgbe_pmd
> EAL: Cannot open /sys/bus/pci/devices/:01:00.0/resource0: No such
> file or directory
> EAL: Error - exiting with code: 1
>   Cause: Requested device :01:00.0 cannot be used
>
> Below are the details on modules, hugepages and device binding.
> root at arm64:~# lsmod
> Module  Size  Used by
> rte_kni   292795  0
> igb_uio 4338  0
> ixgbe 184456  0
>
> root at arm64:~/dpdk# cat 
> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> 2048
>
> root at arm64:~/dpdk# ./tools/dpdk_nic_bind.py --status
>
> Network devices using DPDK-compatible driver
> 
> :01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
> drv=igb_uio unused=
> :01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
> drv=igb_uio unused=
>
> Network devices using kernel driver
> ===
> 
>
> Other network devices
> =
> 
> root at arm64:~/dpdk#
>
> Thanks,
> Ankit
>



[dpdk-dev] Getting error while running DPDK test app on X-Gene1

2016-01-14 Thread Jerin Jacob
On Wed, Jan 13, 2016 at 03:52:01PM +0530, Ankit Jindal wrote:
> Hi,
> 
> We are trying to run dpdk on our arm64 based SOC having Intel 10G
> ixgbe PCIe card plugged. While running any test app, we are getting
> following error.
> 
> EAL: PCI device :01:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:10fb rte_ixgbe_pmd
> EAL: Cannot open /sys/bus/pci/devices/:01:00.0/resource0: No such
> file or directory
> EAL: Error - exiting with code: 1
>   Cause: Requested device :01:00.0 cannot be used


pci resource creation patch is not yet part of the arm64 mainline kernel.
The following patch should fix the problem.

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358906.html

Jerin

> 
> Below are the details on modules, hugepages and device binding.
> root at arm64:~# lsmod
> Module  Size  Used by
> rte_kni   292795  0
> igb_uio 4338  0
> ixgbe 184456  0
> 
> root at arm64:~/dpdk# cat 
> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> 2048
> 
> root at arm64:~/dpdk# ./tools/dpdk_nic_bind.py --status
> 
> Network devices using DPDK-compatible driver
> 
> :01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
> drv=igb_uio unused=
> :01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
> drv=igb_uio unused=
> 
> Network devices using kernel driver
> ===
> 
> 
> Other network devices
> =
> 
> root at arm64:~/dpdk#
> 
> Thanks,
> Ankit


[dpdk-dev] [PATCH v2 0/7] virtio 1.0 enabling for virtio pmd driver

2016-01-14 Thread Tetsuya Mukawa
On 2016/01/12 15:58, Yuanhan Liu wrote:
> v2: - fix a data corruption reported by Qian, due to hdr size mismatch.
>   check detailes at ptach 5.
>
> - Add missing config_irq and isr reading support from v1.
>
> - fix comments from v1.
>
> Almost all difference comes from virtio 1.0 are the PCI layout change:
> the major configuration structures are stored at bar space, and their
> location is stored at corresponding pci cap structure. Reading/parsing
> them is one of the major work of patch 7.
>
> To make handling virtio v1.0 and v0.95 co-exist well, this patch set
> introduces a virtio_pci_ops structure, to add another layer so that
> we could keep those vtpci_foo_bar "APIs". With that, we could do the
> minimum change to add virtio 1.0 support.
>
>
> ---
> Yuanhan Liu (7):
>   virtio: don't set vring address again at queue startup
>   virtio: introduce struct virtio_pci_ops
>   virtio: move left pci stuff to virtio_pci.c
>   viritio: switch to 64 bit features
>   virtio: retrieve hdr_size from hw->vtnet_hdr_size
>   eal: pci: export pci_map_device
>   virtio: add 1.0 support
>
>  doc/guides/rel_notes/release_2_3.rst|   3 +
>  drivers/net/virtio/virtio_ethdev.c  | 301 +-
>  drivers/net/virtio/virtio_ethdev.h  |   3 +-
>  drivers/net/virtio/virtio_pci.c | 768 
> +++-
>  drivers/net/virtio/virtio_pci.h | 102 +++-
>  drivers/net/virtio/virtio_rxtx.c|  21 +-
>  drivers/net/virtio/virtqueue.h  |   4 +-
>  lib/librte_eal/bsdapp/eal/eal_pci.c |   2 +-
>  lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   6 +
>  lib/librte_eal/common/eal_common_pci.c  |   2 +-
>  lib/librte_eal/common/eal_private.h |  11 -
>  lib/librte_eal/common/include/rte_pci.h |  11 +
>  lib/librte_eal/linuxapp/eal/eal_pci.c   |   2 +-
>  lib/librte_eal/linuxapp/eal/rte_eal_version.map |   6 +
>  14 files changed, 899 insertions(+), 343 deletions(-)
>

Hi Yuanhan and Jianfeng,

Thanks for great patches.
I want to use VIRTIO-1.0 feature for my virtio container patch, because
it will solve 44 bit memory address limitation.
(So far, legacy virtio-net device only receives queue address under (1
<< (32 + 12)).)

I have a few comments to rebase virtio container patches on this patches.

1. VIRTIO_READ_REG_X

So far, VIRTIO_READ_REG_1/2/4 are defined in virtio_pci.h.
But these macros are only referred by virtio_pci.c.
How about moving the macros to virtio_pci.c?

2. Abstraction of read/write accesses.

It may be difficult to cleanly rebase my patches on this patches,
because virtio_read_caps() is not abstracted.
Let me describe it more.
So far, we need to handle below 3 virtio-net devices..
 - physical virtio-net device.
 - virtual virtio-net device in virtio-net PMD. (Jianfeng's patch)
 - virtual virtio-net device in QEMU. (my patch)

Almost all code of the virtio-net PMD can be shared between above
different cases.
Probably big difference is how to access to configuration space.

Yuanhan's patch introduces an abstraction layer to hide configuration
space layout and how to access it.
Is it possible to separate?
I guess "access method" will be nice to be abstracted separately from
"configuration space layout".
Probably access method will be defined by "eth_dev->dev_type" and the
PMD name like "eth_cvio".
And "configuration space layout" will be defined by capability list of
PCI configuration layout.

For example, if access method like below are abstracted separately and
current "virtio_pci.c" is implemented on this abstraction, we can easily
re-use virtio_read_caps().
 - how to read/write virtio configuration space.
 - how to mmap PCI configuration space.
 - how to read/(write) PCI configuration space.

Thanks,
Tetsuya


[dpdk-dev] [PATCH v2 0/7] virtio 1.0 enabling for virtio pmd driver

2016-01-14 Thread Yuanhan Liu
On Thu, Jan 14, 2016 at 01:27:37PM +0900, Tetsuya Mukawa wrote:
> On 2016/01/12 15:58, Yuanhan Liu wrote:
> > v2: - fix a data corruption reported by Qian, due to hdr size mismatch.
> >   check detailes at ptach 5.
> >
> > - Add missing config_irq and isr reading support from v1.
> >
> > - fix comments from v1.
> >
> > Almost all difference comes from virtio 1.0 are the PCI layout change:
> > the major configuration structures are stored at bar space, and their
> > location is stored at corresponding pci cap structure. Reading/parsing
> > them is one of the major work of patch 7.
> >
> > To make handling virtio v1.0 and v0.95 co-exist well, this patch set
> > introduces a virtio_pci_ops structure, to add another layer so that
> > we could keep those vtpci_foo_bar "APIs". With that, we could do the
> > minimum change to add virtio 1.0 support.
> >
> >
> 
> Hi Yuanhan and Jianfeng,
> 
> Thanks for great patches.
> I want to use VIRTIO-1.0 feature for my virtio container patch, because
> it will solve 44 bit memory address limitation.
> (So far, legacy virtio-net device only receives queue address under (1
> << (32 + 12)).)
> 
> I have a few comments to rebase virtio container patches on this patches.
> 
> 1. VIRTIO_READ_REG_X
> 
> So far, VIRTIO_READ_REG_1/2/4 are defined in virtio_pci.h.
> But these macros are only referred by virtio_pci.c.
> How about moving the macros to virtio_pci.c?

Jianfeng had same suggestion. I could do that in next version then.

> 2. Abstraction of read/write accesses.
> 
> It may be difficult to cleanly rebase my patches on this patches,
> because virtio_read_caps() is not abstracted.

I don't think we can/need abstract virtio_read_caps() here. As that
detects wheter it is a legacy or modern (virtio 1.0) virtio device
or not.

If virtio_read_caps failes, which could either due to pci map failed,
or because malformed pci layout, we fallback to legacy virtio 1.0
handling, using io port read/write to do configuration.

> Let me describe it more.
> So far, we need to handle below 3 virtio-net devices..
>  - physical virtio-net device.
>  - virtual virtio-net device in virtio-net PMD. (Jianfeng's patch)
>  - virtual virtio-net device in QEMU. (my patch)
> 
> Almost all code of the virtio-net PMD can be shared between above
> different cases.
> Probably big difference is how to access to configuration space.
> 
> Yuanhan's patch introduces an abstraction layer to hide configuration
> space layout and how to access it.

Actually, I didn't introduce the abstraction for pci device access. It's
just a simple "if ... else ..." case here: use io port read/write, the
VIRTIO_READ/WRITE_REG_X macros, to do access for legacy virtio,
otherwise for modern virtio, use direct mapped memory read/write access:
modern_read/writex.

> Is it possible to separate?

As stated, there is no mix, therefore no need for seperation. But
you could add another access abstraction layer, and assign it properly
later, say, by checking eth_dev->dev_type as you suggested below.

> I guess "access method" will be nice to be abstracted separately from
> "configuration space layout".
> Probably access method will be defined by "eth_dev->dev_type" and the
> PMD name like "eth_cvio".
> And "configuration space layout" will be defined by capability list of
> PCI configuration layout.
> 
> For example, if access method like below are abstracted separately and
> current "virtio_pci.c" is implemented on this abstraction, we can easily
> re-use virtio_read_caps().
>  - how to read/write virtio configuration space.

It's abstracted by virtio_pci_ops.

>  - how to mmap PCI configuration space.
>  - how to read/(write) PCI configuration space.

For now, it's actually done by EAL, or by functions provided by EAL.
I haven't read your (as well Jianfeng's) code yet, but seems that
you need implement another set of functions for above needes for
your virtio device.

If so, I'd suggest you (or Jianfeng) to do the abstraction based on
my patchset: what I am kind of sure is that I should not add those
abstraction here, simply for it has nothing to do with virtio 1.0
enabling.

--yliu


[dpdk-dev] VFIO no-iommu

2016-01-14 Thread Jike Song
On Wed, Dec 16, 2015 at 12:38 PM, Alex Williamson
 wrote:
>
> So it works.  Is it acceptable?  Useful?  Sufficiently complete?  Does
> it imply deprecating the uio interface?  I believe the feature that
> started this discussion was support for MSI/X interrupts so that VFs
> can support some kind of interrupt (uio only supports INTx since it
> doesn't allow DMA).  Implementing that would be the ultimate test of
> whether this provides dpdk with not only a more consistent interface,
> but the feature dpdk wants that's missing in uio. Thanks,
>
Hi Alex,

Sorry for jumping in.  Just being curious, how does VFIO No-IOMMU mode
support DMA from userspace drivers?  If I understand correctly, due to
the absence of IOMMU, pcidev has to use physaddr to start a DMA
transaction, but how it is supposed to get physaddr from userspace
drivers, /proc//pagemap or something else?


-- 
Thanks,
Jike


[dpdk-dev] [PATCH v2 0/7] virtio 1.0 enabling for virtio pmd driver

2016-01-14 Thread Tan, Jianfeng

Hi Tetsuya,

On 1/14/2016 12:27 PM, Tetsuya Mukawa wrote:
> On 2016/01/12 15:58, Yuanhan Liu wrote:
> Hi Yuanhan and Jianfeng,
>
> Thanks for great patches.
> I want to use VIRTIO-1.0 feature for my virtio container patch, because
> it will solve 44 bit memory address limitation.
> (So far, legacy virtio-net device only receives queue address under (1
> << (32 + 12)).)

I suppose you are specifying the code below:
 /*
  * Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
  * and only accepts 32 bit page frame number.
  * Check if the allocated physical memory exceeds 16TB.
  */
 if ((mz->phys_addr + vq->vq_ring_size - 1) >> 
(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
 PMD_INIT_LOG(ERR, "vring address shouldn't be above 
16TB!");
 rte_free(vq);
 return -ENOMEM;
 }

So you don't need to add extra cmd option, right?

>
> I have a few comments to rebase virtio container patches on this patches.
>
> 1. VIRTIO_READ_REG_X
>
> So far, VIRTIO_READ_REG_1/2/4 are defined in virtio_pci.h.
> But these macros are only referred by virtio_pci.c.
> How about moving the macros to virtio_pci.c?

+1 for this.

>
> 2. Abstraction of read/write accesses.
>
> It may be difficult to cleanly rebase my patches on this patches,
> because virtio_read_caps() is not abstracted.
> Let me describe it more.
> So far, we need to handle below 3 virtio-net devices..
>   - physical virtio-net device.
>   - virtual virtio-net device in virtio-net PMD. (Jianfeng's patch)
>   - virtual virtio-net device in QEMU. (my patch)
>
> Almost all code of the virtio-net PMD can be shared between above
> different cases.
> Probably big difference is how to access to configuration space.
>
> Yuanhan's patch introduces an abstraction layer to hide configuration
> space layout and how to access it.
> Is it possible to separate?
> I guess "access method" will be nice to be abstracted separately from
> "configuration space layout".
> Probably access method will be defined by "eth_dev->dev_type" and the
> PMD name like "eth_cvio".
> And "configuration space layout" will be defined by capability list of
> PCI configuration layout.
>
> For example, if access method like below are abstracted separately and
> current "virtio_pci.c" is implemented on this abstraction, we can easily
> re-use virtio_read_caps().
>   - how to read/write virtio configuration space.
>   - how to mmap PCI configuration space.
>   - how to read/(write) PCI configuration space.


I basically agree with you. We have two dimensions here:

legacy modern
physical virtio device: Use virtio_read_caps_phys() 
to distinguish
virtual virtio device (Tetsuya):   Use virtio_read_caps_virt() to 
distinguish
virtual virtio device (Jianfeng):does not need a "configuration 
space layout", no need to distinguish

So in vtpci_init(), we needs to test "eth_dev->dev_type" firstly

vtpci_init() {
 if (eth_dev->dev_type == RTE_ETH_DEV_PCI) {
 if (virtio_read_caps_phys()) {
 // modern
 } else {
 // legacy
 }
 } else {
 if (Tetsuya's way) {
 if (virtio_read_caps_virt()) {
 // modern
 } else {
 // legacy
 }
 } else {
 // Jianfeng's way
 }
 }
}

And from Yuanhan's angle, I think he does not need to address this 
problem. How do you think?

Thanks,
Jianfeng


>
> Thanks,
> Tetsuya



[dpdk-dev] [PATCH v2 0/7] virtio 1.0 enabling for virtio pmd driver

2016-01-14 Thread Tetsuya Mukawa
On 2016/01/14 15:09, Tan, Jianfeng wrote:
>
> Hi Tetsuya,
>
> On 1/14/2016 12:27 PM, Tetsuya Mukawa wrote:
>> On 2016/01/12 15:58, Yuanhan Liu wrote:
>> Hi Yuanhan and Jianfeng,
>>
>> Thanks for great patches.
>> I want to use VIRTIO-1.0 feature for my virtio container patch, because
>> it will solve 44 bit memory address limitation.
>> (So far, legacy virtio-net device only receives queue address under (1
>> << (32 + 12)).)
>
> I suppose you are specifying the code below:
> /*
>  * Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
>  * and only accepts 32 bit page frame number.
>  * Check if the allocated physical memory exceeds 16TB.
>  */
> if ((mz->phys_addr + vq->vq_ring_size - 1) >>
> (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
> PMD_INIT_LOG(ERR, "vring address shouldn't be above
> 16TB!");
> rte_free(vq);
> return -ENOMEM;
> }
>
> So you don't need to add extra cmd option, right?

Yes, this is the code.
In our case, instead of using physical address, virtual address will be
used, right?
Problem is that virtual address will be over (1 << 44) without
specifying some mmap options.
Probably we need to specify "MAP_FIXED" option while mmapping EAL memory
to get lower address.

But if we can use VIRTIO-1.0, we can specify 64bit address, then we
don't need to do something tricky mmaping.

>
>>
>> I have a few comments to rebase virtio container patches on this
>> patches.
>>
>> 1. VIRTIO_READ_REG_X
>>
>> So far, VIRTIO_READ_REG_1/2/4 are defined in virtio_pci.h.
>> But these macros are only referred by virtio_pci.c.
>> How about moving the macros to virtio_pci.c?
>
> +1 for this.
>
>>
>> 2. Abstraction of read/write accesses.
>>
>> It may be difficult to cleanly rebase my patches on this patches,
>> because virtio_read_caps() is not abstracted.
>> Let me describe it more.
>> So far, we need to handle below 3 virtio-net devices..
>>   - physical virtio-net device.
>>   - virtual virtio-net device in virtio-net PMD. (Jianfeng's patch)
>>   - virtual virtio-net device in QEMU. (my patch)
>>
>> Almost all code of the virtio-net PMD can be shared between above
>> different cases.
>> Probably big difference is how to access to configuration space.
>>
>> Yuanhan's patch introduces an abstraction layer to hide configuration
>> space layout and how to access it.
>> Is it possible to separate?
>> I guess "access method" will be nice to be abstracted separately from
>> "configuration space layout".
>> Probably access method will be defined by "eth_dev->dev_type" and the
>> PMD name like "eth_cvio".
>> And "configuration space layout" will be defined by capability list of
>> PCI configuration layout.
>>
>> For example, if access method like below are abstracted separately and
>> current "virtio_pci.c" is implemented on this abstraction, we can easily
>> re-use virtio_read_caps().
>>   - how to read/write virtio configuration space.
>>   - how to mmap PCI configuration space.
>>   - how to read/(write) PCI configuration space.
>
>
> I basically agree with you. We have two dimensions here:
>
> legacy modern
> physical virtio device: Use
> virtio_read_caps_phys() to distinguish
> virtual virtio device (Tetsuya):   Use virtio_read_caps_virt() to
> distinguish
> virtual virtio device (Jianfeng):does not need a "configuration
> space layout", no need to distinguish
>
> So in vtpci_init(), we needs to test "eth_dev->dev_type" firstly
>
> vtpci_init() {
> if (eth_dev->dev_type == RTE_ETH_DEV_PCI) {
> if (virtio_read_caps_phys()) {
> // modern
> } else {
> // legacy
> }
> } else {
> if (Tetsuya's way) {
> if (virtio_read_caps_virt()) {
> // modern
> } else {
> // legacy
> }
> } else {
> // Jianfeng's way
> }
> }
> }
>
> And from Yuanhan's angle, I think he does not need to address this
> problem. How do you think?

Yes, I agree he doesn't need.

Firstly, I have implemented like above, then I noticed that
virtio_read_caps_phy() and virtio_read_caps_virt() are same except for
access method.
Anyway, I guess abstracting access method is not so difficult.
If you are OK, I want to send RFC on Yuanhan's patch. Is it OK?

Thanks,
Tetsuya


[dpdk-dev] [PATCH v2 0/7] virtio 1.0 enabling for virtio pmd driver

2016-01-14 Thread Yuanhan Liu
On Thu, Jan 14, 2016 at 02:09:18PM +0800, Tan, Jianfeng wrote:
...
> I basically agree with you. We have two dimensions here:
> 
> legacy modern
> physical virtio device: Use virtio_read_caps_phys() to
> distinguish
> virtual virtio device (Tetsuya):   Use virtio_read_caps_virt() to
> distinguish
> virtual virtio device (Jianfeng):does not need a "configuration space
> layout", no need to distinguish

I guess you meant to build a form or something, but seems you failed :)

> 
> So in vtpci_init(), we needs to test "eth_dev->dev_type" firstly
> 
> vtpci_init() {
> if (eth_dev->dev_type == RTE_ETH_DEV_PCI) {
> if (virtio_read_caps_phys()) {
> // modern
> } else {
> // legacy
> }
> } else {
> if (Tetsuya's way) {
> if (virtio_read_caps_virt()) {
> // modern
> } else {
> // legacy
> }
> } else {
> // Jianfeng's way
> }
> }
> }

Normally, I'd like to hide the details inside virtio_read_caps(): I don't
want similar codes to be appeared twice. And if it can be simply done
by "if (eth_dev->dev_type == ...)", I'd like to do it in this way. If not,
introducing another set of operation abstractions as suggested in my
another email might be a better option.

> And from Yuanhan's angle, I think he does not need to address this problem.

Yep; it just has nothing to do with this patch set.

--yliu


[dpdk-dev] librte_power w/ intel_pstate cpufreq governor

2016-01-14 Thread Matthew Hall
On Tue, Jan 12, 2016 at 03:17:21PM +, Zhang, Helin wrote:
> Hi Matthew
> 
> Yes, you have indicated out the key, the power management module has changed 
> or upgraded.
> Could you help to try the legacy one to see if it still works, as indicated 
> in your link?

I can do this, but according to the documents I am reading, the old Power 
Management module is secretly stubbed out / no-opped inside of the Skylake CPU 
core, and the core manages its own clockrate internally every 1 msec instead 
of every 30 msec with input from the OS (Intel Speed Shift technology).

If this is true, then I suspect there is no point to getting it to work again 
with either the old frequency driver or the new driver, because the chip would 
not listen to it. So then it seems like it makes sense to skip the clock 
adjustment callbacks on Skylake and take extra stuff out of the fastpath code.

> Taking control of the governor from kernel to user space, might need one 
> more checks before that. But it is actually not a big issue, as user can 
> switch it back to anything via 'echo'.

I think it's a bit bigger issue, as it leaves the chip in full-power mode 
without really warning anybody, instead of the standard default adaptive mode. 

> Yes, it seems that librte_power is out of date for a while. It is not easy 
> to track all the kernel versions. Now we have good chance to do that, as you 
> have reported issues. Let's have a look on the new power management 
> mechanism and then see if we can do something.

Yes, let me know how I could help. I don't know very much yet. My machine is 
Skylake Core i7-6700k. Unfortunately I think I am in trouble here, because 
there is no whitepaper on the Intel website for Intel Speed Shift technology 
at all.

> Really thanks to your questions!

I am looking forward to getting some answers figured out together.

> Regards,
> Helin

Matthew.


[dpdk-dev] librte_power w/ intel_pstate cpufreq governor

2016-01-14 Thread Matthew Hall
On Thu, Jan 14, 2016 at 02:03:55AM -0500, Matthew Hall wrote:
> Yes, let me know how I could help. I don't know very much yet. My machine is 
> Skylake Core i7-6700k. Unfortunately I think I am in trouble here, because 
> there is no whitepaper on the Intel website for Intel Speed Shift technology 
> at all.

This is the closest thing I could find:

http://wccftech.com/idf15-intel-skylake-analysis-cpu-gpu-microarchitecture-ddr4-memory-impact/4/

Some copy of a presentation from Intel IDF15.

Can somebody at Intel help me to find more papers or the right instruction or 
architecture manuals for HWP (Hardware P-State) feature?

Matthew.


[dpdk-dev] librte_power w/ intel_pstate cpufreq governor

2016-01-14 Thread Zhang, Helin


> -Original Message-
> From: Matthew Hall [mailto:mhall at mhcomputing.net]
> Sent: Thursday, January 14, 2016 3:04 PM
> To: Zhang, Helin
> Cc: dev at dpdk.org; Liang, Cunming; Zhou, Danny
> Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor
> 
> On Tue, Jan 12, 2016 at 03:17:21PM +, Zhang, Helin wrote:
> > Hi Matthew
> >
> > Yes, you have indicated out the key, the power management module has
> changed or upgraded.
> > Could you help to try the legacy one to see if it still works, as indicated 
> > in
> your link?
> 
> I can do this, but according to the documents I am reading, the old Power
> Management module is secretly stubbed out / no-opped inside of the
> Skylake CPU core, and the core manages its own clockrate internally every 1
> msec instead of every 30 msec with input from the OS (Intel Speed Shift
> technology).
> 
> If this is true, then I suspect there is no point to getting it to work again 
> with
> either the old frequency driver or the new driver, because the chip would
> not listen to it. So then it seems like it makes sense to skip the clock
> adjustment callbacks on Skylake and take extra stuff out of the fastpath code.
That's disappointing if Skylake is like that. Let's have a learning first, and 
then check if we can fix that.
But in addition, DPDK provide interrupt based packet receiving mechanism, can 
it be one of your choice?

For now, I am afraid that I don't have time on it, as we are all focusing on 
the next release development.
If no objection, I will find time later (may be in a month) to investigate that.
Of cause, please try to investigate that from your side.

> 
> > Taking control of the governor from kernel to user space, might need
> > one more checks before that. But it is actually not a big issue, as
> > user can switch it back to anything via 'echo'.
> 
> I think it's a bit bigger issue, as it leaves the chip in full-power mode 
> without
> really warning anybody, instead of the standard default adaptive mode.
That's always there, for example, DPDK can exit accidently, without caring 
anything.
Then you can have the similar issue again.

> 
> > Yes, it seems that librte_power is out of date for a while. It is not
> > easy to track all the kernel versions. Now we have good chance to do
> > that, as you have reported issues. Let's have a look on the new power
> > management mechanism and then see if we can do something.
> 
> Yes, let me know how I could help. I don't know very much yet. My machine
> is Skylake Core i7-6700k. Unfortunately I think I am in trouble here, because
> there is no whitepaper on the Intel website for Intel Speed Shift technology
> at all.
It seems that you are so important for Intel. :) I don't have Skylake in hand. 
:(
Anyway, I will try to find time on that, and hopefully will find something or 
solution.
Thank you very much for the great jobs!

Regards,
Helin

> 
> > Really thanks to your questions!
> 
> I am looking forward to getting some answers figured out together.
> 
> > Regards,
> > Helin
> 
> Matthew.


[dpdk-dev] [PATCH v3 1/8] virtio: don't set vring address again at queue startup

2016-01-14 Thread Yuanhan Liu
As we have already set up it at virtio_dev_queue_setup(), and a vq
restart will not reset the settings.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_rxtx.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 74b39ef..b7267c0 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -339,11 +339,6 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
vq_update_avail_idx(vq);

PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs);
-
-   VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-   vq->vq_queue_index);
-   VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-   vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
} else if (queue_type == VTNET_TQ) {
if (use_simple_rxtx) {
int mid_idx  = vq->vq_nentries >> 1;
@@ -362,16 +357,6 @@ virtio_dev_vring_start(struct virtqueue *vq, int 
queue_type)
for (i = mid_idx; i < vq->vq_nentries; i++)
vq->vq_ring.avail->ring[i] = i;
}
-
-   VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-   vq->vq_queue_index);
-   VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-   vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
-   } else {
-   VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL,
-   vq->vq_queue_index);
-   VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN,
-   vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
}
 }

-- 
1.9.0



[dpdk-dev] [PATCH v3 0/8] virtio 1.0 enabling for virtio pmd driver

2016-01-14 Thread Yuanhan Liu
v3: - export pci_unmap_device as well; and invoke it at virtio
  uninit stage.

- fixed same data corruption bug reported by Qian in simple
  rxtx code path.

- move VIRTIO_READ/WRITE_REG_X to virtio_pci.c

v2: - fix a data corruption reported by Qian, due to hdr size mismatch.
  check detailes at ptach 5.

- Add missing config_irq and isr reading support from v1.

- fix comments from v1.

Almost all difference comes from virtio 1.0 are the PCI layout change:
the major configuration structures are stored at bar space, and their
location is stored at corresponding pci cap structure. Reading/parsing
them is one of the major work of patch 7.

To make handling virtio v1.0 and v0.95 co-exist well, this patch set
introduces a virtio_pci_ops structure, to add another layer so that
we could keep those vtpci_foo_bar "APIs". With that, we could do the
minimum change to add virtio 1.0 support.


Rough test guide


Firstly, you need get a virtio 1.0 supported QEMU (say, v2.5), then add
option "disable-modern=false" to qemu virtio-net-pci device to enable
virtio 1.0 (which is disabled by default).

And if you see something like following from 'lspci -v', it means virtio
1.0 is indeed enabled:

00:04.0 Ethernet controller: Red Hat, Inc Virtio network device
Subsystem: Red Hat, Inc Device 0001
Physical Slot: 4
Flags: bus master, fast devsel, latency 0, IRQ 11
I/O ports at c040 [size=64]
Memory at febf1000 (32-bit, non-prefetchable) [size=4K]
Memory at fe00 (64-bit, prefetchable) [size=8M]
Expansion ROM at feb8 [disabled] [size=256K]
Capabilities: [98] MSI-X: Enable+ Count=6 Masked-
==> Capabilities: [84] Vendor Specific Information: Len=14 
==> Capabilities: [70] Vendor Specific Information: Len=14 
==> Capabilities: [60] Vendor Specific Information: Len=10 
==> Capabilities: [50] Vendor Specific Information: Len=10 
==> Capabilities: [40] Vendor Specific Information: Len=10 
Kernel driver in use: virtio-pci
Kernel modules: virtio_pci

After that, there wasn't anything speical comparing to the old virtio
0.95 pmd driver.


---
Yuanhan Liu (8):
  virtio: don't set vring address again at queue startup
  virtio: introduce struct virtio_pci_ops
  virtio: move left pci stuff to virtio_pci.c
  viritio: switch to 64 bit features
  virtio: retrieve hdr_size from hw->vtnet_hdr_size
  eal: pci: export pci_[un]map_device
  virtio: add 1.0 support
  virtio: move VIRTIO_READ/WRITE_REG_X into virtio_pci.c

 doc/guides/rel_notes/release_2_3.rst|   3 +
 drivers/net/virtio/virtio_ethdev.c  | 302 +
 drivers/net/virtio/virtio_ethdev.h  |   3 +-
 drivers/net/virtio/virtio_pci.c | 787 +++-
 drivers/net/virtio/virtio_pci.h | 120 +++-
 drivers/net/virtio/virtio_rxtx.c|  21 +-
 drivers/net/virtio/virtio_rxtx_simple.c |  12 +-
 drivers/net/virtio/virtqueue.h  |   4 +-
 lib/librte_eal/bsdapp/eal/eal_pci.c |   4 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |   7 +
 lib/librte_eal/common/eal_common_pci.c  |   4 +-
 lib/librte_eal/common/eal_private.h |  18 -
 lib/librte_eal/common/include/rte_pci.h |  27 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |   4 +-
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |   7 +
 15 files changed, 946 insertions(+), 377 deletions(-)

-- 
1.9.0



[dpdk-dev] [PATCH v3 3/8] virtio: move left pci stuff to virtio_pci.c

2016-01-14 Thread Yuanhan Liu
virtio_pci.c is a more proper place for pci stuff; virtio_ethdev.c is not.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c | 265 +---
 drivers/net/virtio/virtio_pci.c| 270 -
 2 files changed, 270 insertions(+), 265 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 6c1d3a0..b57224d 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -36,10 +36,6 @@
 #include 
 #include 
 #include 
-#ifdef RTE_EXEC_ENV_LINUXAPP
-#include 
-#include 
-#endif

 #include 
 #include 
@@ -955,260 +951,6 @@ virtio_negotiate_features(struct virtio_hw *hw)
hw->guest_features);
 }

-#ifdef RTE_EXEC_ENV_LINUXAPP
-static int
-parse_sysfs_value(const char *filename, unsigned long *val)
-{
-   FILE *f;
-   char buf[BUFSIZ];
-   char *end = NULL;
-
-   f = fopen(filename, "r");
-   if (f == NULL) {
-   PMD_INIT_LOG(ERR, "%s(): cannot open sysfs value %s",
-__func__, filename);
-   return -1;
-   }
-
-   if (fgets(buf, sizeof(buf), f) == NULL) {
-   PMD_INIT_LOG(ERR, "%s(): cannot read sysfs value %s",
-__func__, filename);
-   fclose(f);
-   return -1;
-   }
-   *val = strtoul(buf, &end, 0);
-   if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) {
-   PMD_INIT_LOG(ERR, "%s(): cannot parse sysfs value %s",
-__func__, filename);
-   fclose(f);
-   return -1;
-   }
-   fclose(f);
-   return 0;
-}
-
-static int get_uio_dev(struct rte_pci_addr *loc, char *buf, unsigned int 
buflen,
-   unsigned int *uio_num)
-{
-   struct dirent *e;
-   DIR *dir;
-   char dirname[PATH_MAX];
-
-   /* depending on kernel version, uio can be located in uio/uioX
-* or uio:uioX */
-   snprintf(dirname, sizeof(dirname),
-SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio",
-loc->domain, loc->bus, loc->devid, loc->function);
-   dir = opendir(dirname);
-   if (dir == NULL) {
-   /* retry with the parent directory */
-   snprintf(dirname, sizeof(dirname),
-SYSFS_PCI_DEVICES "/" PCI_PRI_FMT,
-loc->domain, loc->bus, loc->devid, loc->function);
-   dir = opendir(dirname);
-
-   if (dir == NULL) {
-   PMD_INIT_LOG(ERR, "Cannot opendir %s", dirname);
-   return -1;
-   }
-   }
-
-   /* take the first file starting with "uio" */
-   while ((e = readdir(dir)) != NULL) {
-   /* format could be uio%d ...*/
-   int shortprefix_len = sizeof("uio") - 1;
-   /* ... or uio:uio%d */
-   int longprefix_len = sizeof("uio:uio") - 1;
-   char *endptr;
-
-   if (strncmp(e->d_name, "uio", 3) != 0)
-   continue;
-
-   /* first try uio%d */
-   errno = 0;
-   *uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10);
-   if (errno == 0 && endptr != (e->d_name + shortprefix_len)) {
-   snprintf(buf, buflen, "%s/uio%u", dirname, *uio_num);
-   break;
-   }
-
-   /* then try uio:uio%d */
-   errno = 0;
-   *uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10);
-   if (errno == 0 && endptr != (e->d_name + longprefix_len)) {
-   snprintf(buf, buflen, "%s/uio:uio%u", dirname,
-*uio_num);
-   break;
-   }
-   }
-   closedir(dir);
-
-   /* No uio resource found */
-   if (e == NULL) {
-   PMD_INIT_LOG(ERR, "Could not find uio resource");
-   return -1;
-   }
-
-   return 0;
-}
-
-static int
-virtio_has_msix(const struct rte_pci_addr *loc)
-{
-   DIR *d;
-   char dirname[PATH_MAX];
-
-   snprintf(dirname, sizeof(dirname),
-SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/msi_irqs",
-loc->domain, loc->bus, loc->devid, loc->function);
-
-   d = opendir(dirname);
-   if (d)
-   closedir(d);
-
-   return (d != NULL);
-}
-
-/* Extract I/O port numbers from sysfs */
-static int virtio_resource_init_by_uio(struct rte_pci_device *pci_dev)
-{
-   char dirname[PATH_MAX];
-   char filename[PATH_MAX];
-   unsigned long start, size;
-   unsigned int uio_num;
-
-   if (get_uio_dev(&pci_dev->addr, dirname, sizeof(dirname), &uio_num) < 0)
-   return -1;
-
-   /* get portio size */
-   snprintf(filename, sizeof(filename),
-"%s/portio

[dpdk-dev] [PATCH v3 2/8] virtio: introduce struct virtio_pci_ops

2016-01-14 Thread Yuanhan Liu
Introduce struct virtio_pci_ops, to let legacy virtio (v0.95) and
modern virtio (1.0) have different implementation regarding to a
specific pci action, such as read host status.

With that, this patch reimplements all exported pci functions, in
a way like:

vtpci_foo_bar(struct virtio_hw *hw)
{
hw->vtpci_ops->foo_bar(hw);
}

So that we need pay attention to those pci related functions only
while adding virtio 1.0 support.

This patch introduced a new vtpci function, vtpci_init(), to do
proper virtio pci settings. It's pretty simple so far: just sets
hw->vtpci_ops to legacy_ops as we don't support 1.0 yet.

Signed-off-by: Yuanhan Liu 
---

v2: extra whitespace line removing, and comment on "reading status
after reset".

rename the badly taken op name "set_irq" to "set_config_irq".
---
 drivers/net/virtio/virtio_ethdev.c |  22 ++
 drivers/net/virtio/virtio_pci.c| 158 ++---
 drivers/net/virtio/virtio_pci.h|  27 +++
 drivers/net/virtio/virtqueue.h |   2 +-
 4 files changed, 166 insertions(+), 43 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index d928339..6c1d3a0 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -272,9 +272,7 @@ virtio_dev_queue_release(struct virtqueue *vq) {

if (vq) {
hw = vq->hw;
-   /* Select and deactivate the queue */
-   VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, 
vq->vq_queue_index);
-   VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0);
+   hw->vtpci_ops->del_queue(hw, vq);

rte_free(vq->sw_ring);
rte_free(vq);
@@ -295,15 +293,13 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
struct virtio_hw *hw = dev->data->dev_private;
struct virtqueue *vq = NULL;

-   /* Write the virtqueue index to the Queue Select Field */
-   VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vtpci_queue_idx);
-   PMD_INIT_LOG(DEBUG, "selecting queue: %u", vtpci_queue_idx);
+   PMD_INIT_LOG(DEBUG, "setting up queue: %u", vtpci_queue_idx);

/*
 * Read the virtqueue size from the Queue Size field
 * Always power of 2 and if 0 virtqueue does not exist
 */
-   vq_size = VIRTIO_READ_REG_2(hw, VIRTIO_PCI_QUEUE_NUM);
+   vq_size = hw->vtpci_ops->get_queue_num(hw, vtpci_queue_idx);
PMD_INIT_LOG(DEBUG, "vq_size: %u nb_desc:%u", vq_size, nb_desc);
if (vq_size == 0) {
PMD_INIT_LOG(ERR, "%s: virtqueue does not exist", __func__);
@@ -436,12 +432,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE);
}

-   /*
-* Set guest physical address of the virtqueue
-* in VIRTIO_PCI_QUEUE_PFN config register of device
-*/
-   VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN,
-   mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT);
+   hw->vtpci_ops->setup_queue(hw, vq);
+
*pvq = vq;
return 0;
 }
@@ -950,7 +942,7 @@ virtio_negotiate_features(struct virtio_hw *hw)
hw->guest_features);

/* Read device(host) feature bits */
-   host_features = VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES);
+   host_features = hw->vtpci_ops->get_features(hw);
PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x",
host_features);

@@ -1287,6 +1279,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

pci_dev = eth_dev->pci_dev;

+   vtpci_init(pci_dev, hw);
+
if (virtio_resource_init(pci_dev) < 0)
return -1;

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 2245bec..9930efa 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -34,12 +34,11 @@

 #include "virtio_pci.h"
 #include "virtio_logs.h"
+#include "virtqueue.h"

-static uint8_t vtpci_get_status(struct virtio_hw *);
-
-void
-vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
-   void *dst, int length)
+static void
+legacy_read_dev_config(struct virtio_hw *hw, uint64_t offset,
+  void *dst, int length)
 {
uint64_t off;
uint8_t *d;
@@ -60,9 +59,9 @@ vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
}
 }

-void
-vtpci_write_dev_config(struct virtio_hw *hw, uint64_t offset,
-   void *src, int length)
+static void
+legacy_write_dev_config(struct virtio_hw *hw, uint64_t offset,
+   void *src, int length)
 {
uint64_t off;
uint8_t *s;
@@ -83,30 +82,133 @@ vtpci_write_dev_config(struct virtio_hw *hw, uint64_t 
offset,
}
 }

+static uint32_t
+legacy_get_features(struct virtio_hw *hw)
+{
+   return VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES);
+}
+
+static void
+legacy_set_fea

[dpdk-dev] [PATCH v3 4/8] viritio: switch to 64 bit features

2016-01-14 Thread Yuanhan Liu
Switch to 64 bit features, that virtio 1.0 supports.

While legacy virtio only supports 32 bit features, here we complain
aloud and quit when trying to setting > 32 bit features for legacy
device.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_ethdev.c |  8 
 drivers/net/virtio/virtio_pci.c| 15 ++-
 drivers/net/virtio/virtio_pci.h| 12 ++--
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index b57224d..94e0c4a 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -930,16 +930,16 @@ virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t 
vlan_id, int on)
 static void
 virtio_negotiate_features(struct virtio_hw *hw)
 {
-   uint32_t host_features;
+   uint64_t host_features;

/* Prepare guest_features: feature that driver wants to support */
hw->guest_features = VIRTIO_PMD_GUEST_FEATURES;
-   PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %x",
+   PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %"PRIx64,
hw->guest_features);

/* Read device(host) feature bits */
host_features = hw->vtpci_ops->get_features(hw);
-   PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x",
+   PMD_INIT_LOG(DEBUG, "host_features before negotiate = %"PRIx64,
host_features);

/*
@@ -947,7 +947,7 @@ virtio_negotiate_features(struct virtio_hw *hw)
 * guest feature bits.
 */
hw->guest_features = vtpci_negotiate_features(hw, host_features);
-   PMD_INIT_LOG(DEBUG, "features after negotiate = %x",
+   PMD_INIT_LOG(DEBUG, "features after negotiate = %"PRIx64,
hw->guest_features);
 }

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 03d623b..5eed57e 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -87,15 +87,20 @@ legacy_write_dev_config(struct virtio_hw *hw, uint64_t 
offset,
}
 }

-static uint32_t
+static uint64_t
 legacy_get_features(struct virtio_hw *hw)
 {
return VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES);
 }

 static void
-legacy_set_features(struct virtio_hw *hw, uint32_t features)
+legacy_set_features(struct virtio_hw *hw, uint64_t features)
 {
+   if ((features >> 32) != 0) {
+   PMD_DRV_LOG(ERR,
+   "only 32 bit features are allowed for legacy virtio!");
+   return;
+   }
VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_GUEST_FEATURES, features);
 }

@@ -451,10 +456,10 @@ vtpci_write_dev_config(struct virtio_hw *hw, uint64_t 
offset,
hw->vtpci_ops->write_dev_cfg(hw, offset, src, length);
 }

-uint32_t
-vtpci_negotiate_features(struct virtio_hw *hw, uint32_t host_features)
+uint64_t
+vtpci_negotiate_features(struct virtio_hw *hw, uint64_t host_features)
 {
-   uint32_t features;
+   uint64_t features;

/*
 * Limit negotiated features to what the driver, virtqueue, and
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index ee7d265..3fd86f6 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -175,8 +175,8 @@ struct virtio_pci_ops {
uint8_t (*get_status)(struct virtio_hw *hw);
void(*set_status)(struct virtio_hw *hw, uint8_t status);

-   uint32_t (*get_features)(struct virtio_hw *hw);
-   void (*set_features)(struct virtio_hw *hw, uint32_t features);
+   uint64_t (*get_features)(struct virtio_hw *hw);
+   void (*set_features)(struct virtio_hw *hw, uint64_t features);

uint8_t (*get_isr)(struct virtio_hw *hw);

@@ -191,7 +191,7 @@ struct virtio_pci_ops {
 struct virtio_hw {
struct virtqueue *cvq;
uint32_tio_base;
-   uint32_tguest_features;
+   uint64_tguest_features;
uint32_tmax_tx_queues;
uint32_tmax_rx_queues;
uint16_tvtnet_hdr_size;
@@ -271,9 +271,9 @@ outl_p(unsigned int data, unsigned int port)
outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg

 static inline int
-vtpci_with_feature(struct virtio_hw *hw, uint32_t bit)
+vtpci_with_feature(struct virtio_hw *hw, uint64_t bit)
 {
-   return (hw->guest_features & (1u << bit)) != 0;
+   return (hw->guest_features & (1ULL << bit)) != 0;
 }

 /*
@@ -286,7 +286,7 @@ void vtpci_reinit_complete(struct virtio_hw *);

 void vtpci_set_status(struct virtio_hw *, uint8_t);

-uint32_t vtpci_negotiate_features(struct virtio_hw *, uint32_t);
+uint64_t vtpci_negotiate_features(struct virtio_hw *, uint64_t);

 void vtpci_write_dev_config(struct virtio_hw *, uint64_t, void *, int);

-- 
1.9.0



[dpdk-dev] [PATCH v3 5/8] virtio: retrieve hdr_size from hw->vtnet_hdr_size

2016-01-14 Thread Yuanhan Liu
The mergeable virtio net hdr format has been the standard and the
only virtio net hdr format since virtio 1.0. Therefore, we can
not hardcode hdr_size to "sizeof(struct virtio_net_hdr)" any more
at virtio_recv_pkts(), otherwise, there would be a mismatch of
hdr size from rte_vhost_enqueue_burst() and virtio_recv_pkts(),
leading a packet corruption.

Instead, we should retrieve it from hw->vtnet_hdr_size; we will
do proper settings at eth_virtio_dev_init() in later patches.

Signed-off-by: Yuanhan Liu 
---

v3: retrieve hdr_size from hw->vtnet_hdr_size for simple rxtx
code path as well: it should not break anything, as simple
rx and mergeable rx still will not co-exist.
---
 drivers/net/virtio/virtio_rxtx.c|  6 --
 drivers/net/virtio/virtio_rxtx_simple.c | 12 ++--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index b7267c0..41a1366 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -560,7 +560,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ];
int error;
uint32_t i, nb_enqueued;
-   const uint32_t hdr_size = sizeof(struct virtio_net_hdr);
+   uint32_t hdr_size;

nb_used = VIRTQUEUE_NUSED(rxvq);

@@ -580,6 +580,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
hw = rxvq->hw;
nb_rx = 0;
nb_enqueued = 0;
+   hdr_size = hw->vtnet_hdr_size;

for (i = 0; i < num ; i++) {
rxm = rcv_pkts[i];
@@ -664,7 +665,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
uint32_t seg_num;
uint16_t extra_idx;
uint32_t seg_res;
-   const uint32_t hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+   uint32_t hdr_size;

nb_used = VIRTQUEUE_NUSED(rxvq);

@@ -682,6 +683,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
seg_num = 0;
extra_idx = 0;
seg_res = 0;
+   hdr_size = hw->vtnet_hdr_size;

while (i < nb_used) {
struct virtio_net_hdr_mrg_rxbuf *header;
diff --git a/drivers/net/virtio/virtio_rxtx_simple.c 
b/drivers/net/virtio/virtio_rxtx_simple.c
index ff3c11a..3e66e8b 100644
--- a/drivers/net/virtio/virtio_rxtx_simple.c
+++ b/drivers/net/virtio/virtio_rxtx_simple.c
@@ -81,9 +81,9 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq,

start_dp = vq->vq_ring.desc;
start_dp[desc_idx].addr = (uint64_t)((uintptr_t)cookie->buf_physaddr +
-   RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+   RTE_PKTMBUF_HEADROOM - vq->hw->vtnet_hdr_size);
start_dp[desc_idx].len = cookie->buf_len -
-   RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+   RTE_PKTMBUF_HEADROOM + vq->hw->vtnet_hdr_size;

vq->vq_free_cnt--;
vq->vq_avail_idx++;
@@ -120,9 +120,9 @@ virtio_rxq_rearm_vec(struct virtqueue *rxvq)

start_dp[i].addr =
(uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr +
-   RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr));
+   RTE_PKTMBUF_HEADROOM - rxvq->hw->vtnet_hdr_size);
start_dp[i].len = sw_ring[i]->buf_len -
-   RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr);
+   RTE_PKTMBUF_HEADROOM + rxvq->hw->vtnet_hdr_size;
}

rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH;
@@ -175,8 +175,8 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf 
**rx_pkts,
len_adjust = _mm_set_epi16(
0, 0,
0,
-   (uint16_t) -sizeof(struct virtio_net_hdr),
-   0, (uint16_t) -sizeof(struct virtio_net_hdr),
+   (uint16_t) -rxvq->hw->vtnet_hdr_size,
+   0, (uint16_t) -rxvq->hw->vtnet_hdr_size,
0, 0);

if (unlikely(nb_pkts < RTE_VIRTIO_DESC_PER_LOOP))
-- 
1.9.0



[dpdk-dev] [PATCH v3 6/8] eal: pci: export pci_[un]map_device

2016-01-14 Thread Yuanhan Liu
Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will
invoke pci_map_device internally for us. From that point view, there
is no need to export pci_map_device.

However, for virtio pmd driver, which is designed to work without
binding UIO (or something similar first), pci_map_device() will fail,
which ends up with virtio pmd driver being skipped. Therefore, we can
not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver.

Therefore, this patch exports pci_map_device, and let virtio pmd
call it when necessary.

Cc: David Marchand 
Signed-off-by: Yuanhan Liu 
---
v3: - export pci_unmap_device as well

- Add few more comments about rte_eal_pci_map_device().
---
 lib/librte_eal/bsdapp/eal/eal_pci.c |  4 ++--
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  7 +++
 lib/librte_eal/common/eal_common_pci.c  |  4 ++--
 lib/librte_eal/common/eal_private.h | 18 -
 lib/librte_eal/common/include/rte_pci.h | 27 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |  4 ++--
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++
 7 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 6c21fbd..95c32c1 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -93,7 +93,7 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev 
__rte_unused)

 /* Map pci device */
 int
-pci_map_device(struct rte_pci_device *dev)
+rte_eal_pci_map_device(struct rte_pci_device *dev)
 {
int ret = -1;

@@ -115,7 +115,7 @@ pci_map_device(struct rte_pci_device *dev)

 /* Unmap pci device */
 void
-pci_unmap_device(struct rte_pci_device *dev)
+rte_eal_pci_unmap_device(struct rte_pci_device *dev)
 {
/* try unmapping the NIC resources */
switch (dev->kdrv) {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 9d7adf1..1b28170 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -135,3 +135,10 @@ DPDK_2.2 {
rte_xen_dom0_supported;

 } DPDK_2.1;
+
+DPDK_2.3 {
+   global:
+
+   rte_eal_pci_map_device;
+   rte_eal_pci_unmap_device;
+} DPDK_2.2;
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index dcfe947..96d5113 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -188,7 +188,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
pci_config_space_set(dev);
 #endif
/* map resources for devices that use igb_uio */
-   ret = pci_map_device(dev);
+   ret = rte_eal_pci_map_device(dev);
if (ret != 0)
return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
@@ -254,7 +254,7 @@ rte_eal_pci_detach_dev(struct rte_pci_driver *dr,

if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING)
/* unmap resources for devices that use igb_uio */
-   pci_unmap_device(dev);
+   rte_eal_pci_unmap_device(dev);

return 0;
}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 072e672..2342fa1 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -165,24 +165,6 @@ struct rte_pci_device;
 int pci_unbind_kernel_driver(struct rte_pci_device *dev);

 /**
- * Map this device
- *
- * This function is private to EAL.
- *
- * @return
- *   0 on success, negative on error and positive if no driver
- *   is found for the device.
- */
-int pci_map_device(struct rte_pci_device *dev);
-
-/**
- * Unmap this device
- *
- * This function is private to EAL.
- */
-void pci_unmap_device(struct rte_pci_device *dev);
-
-/**
  * Map the PCI resource of a PCI device in virtual memory
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 334c12e..2224109 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -485,6 +485,33 @@ int rte_eal_pci_read_config(const struct rte_pci_device 
*device,
  */
 int rte_eal_pci_write_config(const struct rte_pci_device *device,
 const void *buf, size_t len, off_t offset);
+/**
+ * Map the PCI device resources in user space virtual memory address
+ *
+ * Note that driver should not call this function when flag
+ * RTE_PCI_DRV_NEED_MAPPING is set, as EAL will do that for
+ * you when it's on.
+ *
+ * @param dev
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ *
+ * @return
+ *   0 on success, negative on error and po

[dpdk-dev] [PATCH v3 7/8] virtio: add 1.0 support

2016-01-14 Thread Yuanhan Liu
Modern (v1.0) virtio pci device defines several pci capabilities.
Each cap has a configure structure corresponding to it, and the
cap.bar and cap.offset fields tell us where to find it.

Firstly, we map the pci resources by rte_eal_pci_map_device().
We then could easily locate to a cfg structure by:

cfg_addr = dev->mem_resources[cap.bar].addr + cap.offset;

Therefore, the entrance of enabling modern (v1.0) pci device support
is to iterate the pci capability lists, and to locate some configs
we care; and they are:

- common cfg

  For generic virtio and virtuqueu configuration, such as setting/getting
  features, enabling a specific queue, and so on.

- nofity cfg

  Combining with `queue_notify_off' from common cfg, we could use it to
  notify a specific virt queue.

- device cfg

  Where virtio_net_config structure locates.

- isr cfg

  Where to read isr (interrupt status).

If any of above cap is not found, we fallback to the legacy virtio
handling.

If succeed, hw->vtpci_ops is assigned to modern_ops, where all
operations are implemented by reading/writing a (or few) specific
configuration space from above 4 cfg structures. And that's basically
how this patch works.

Besides those changes, virtio 1.0 introduces a new status field:
FEATURES_OK, which is set after features negotiation is done.

Last, set the VIRTIO_F_VERSION_1 feature flag.

Signed-off-by: Yuanhan Liu 
---

v2: - re-read status after setting FEATURES_OK to make sure status is
  set correctly.

- Add isr reading and config irq setting support.

- Define some pci macro on our own to not get the dependency of
  linux/pci_regs.h, as there should be no such file at non-Linux
  platform

v3: - invoke rte_eal_pci_unmap_device() at uninit stage
---
 doc/guides/rel_notes/release_2_3.rst |   3 +
 drivers/net/virtio/virtio_ethdev.c   |  25 ++-
 drivers/net/virtio/virtio_ethdev.h   |   3 +-
 drivers/net/virtio/virtio_pci.c  | 335 ++-
 drivers/net/virtio/virtio_pci.h  |  67 +++
 drivers/net/virtio/virtqueue.h   |   2 +
 6 files changed, 430 insertions(+), 5 deletions(-)

diff --git a/doc/guides/rel_notes/release_2_3.rst 
b/doc/guides/rel_notes/release_2_3.rst
index 99de186..c390d97 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -4,6 +4,9 @@ DPDK Release 2.3
 New Features
 

+* **Virtio 1.0 support.**
+
+  Enabled virtio 1.0 support for virtio pmd driver.

 Resolved Issues
 ---
diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 94e0c4a..deb0382 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -927,7 +927,7 @@ virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t 
vlan_id, int on)
return virtio_send_command(hw->cvq, &ctrl, &len, 1);
 }

-static void
+static int
 virtio_negotiate_features(struct virtio_hw *hw)
 {
uint64_t host_features;
@@ -949,6 +949,22 @@ virtio_negotiate_features(struct virtio_hw *hw)
hw->guest_features = vtpci_negotiate_features(hw, host_features);
PMD_INIT_LOG(DEBUG, "features after negotiate = %"PRIx64,
hw->guest_features);
+
+   if (hw->modern) {
+   if (!vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) {
+   PMD_INIT_LOG(ERR,
+   "VIRTIO_F_VERSION_1 features is not enabled.");
+   return -1;
+   }
+   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_FEATURES_OK);
+   if (!(vtpci_get_status(hw) & VIRTIO_CONFIG_STATUS_FEATURES_OK)) 
{
+   PMD_INIT_LOG(ERR,
+   "failed to set FEATURES_OK status!");
+   return -1;
+   }
+   }
+
+   return 0;
 }

 /*
@@ -1032,7 +1048,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

/* Tell the host we've known how to drive the device. */
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
-   virtio_negotiate_features(hw);
+   if (virtio_negotiate_features(hw) < 0)
+   return -1;

/* If host does not support status then disable LSC */
if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS))
@@ -1043,7 +1060,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)
rx_func_get(eth_dev);

/* Setting up rx_header size for the device */
-   if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
+   if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) ||
+   vtpci_with_feature(hw, VIRTIO_F_VERSION_1))
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
else
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
@@ -1159,6 +1177,7 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev)
rte_intr_callback_unregister(&pci_dev->intr_handle,
virtio_interrupt_handle

[dpdk-dev] [PATCH v3 8/8] virtio: move VIRTIO_READ/WRITE_REG_X into virtio_pci.c

2016-01-14 Thread Yuanhan Liu
virtio_pci.c become the only file references those macros; move them there.

Signed-off-by: Yuanhan Liu 
---
 drivers/net/virtio/virtio_pci.c | 19 +++
 drivers/net/virtio/virtio_pci.h | 18 --
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 9b62013..8e26f00 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -49,6 +49,25 @@
 #define PCI_CAPABILITY_LIST0x34
 #define PCI_CAP_ID_VNDR0x09

+
+#define VIRTIO_PCI_REG_ADDR(hw, reg) \
+   (unsigned short)((hw)->io_base + (reg))
+
+#define VIRTIO_READ_REG_1(hw, reg) \
+   inb((VIRTIO_PCI_REG_ADDR((hw), (reg
+#define VIRTIO_WRITE_REG_1(hw, reg, value) \
+   outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
+
+#define VIRTIO_READ_REG_2(hw, reg) \
+   inw((VIRTIO_PCI_REG_ADDR((hw), (reg
+#define VIRTIO_WRITE_REG_2(hw, reg, value) \
+   outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
+
+#define VIRTIO_READ_REG_4(hw, reg) \
+   inl((VIRTIO_PCI_REG_ADDR((hw), (reg
+#define VIRTIO_WRITE_REG_4(hw, reg, value) \
+   outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
+
 static void
 legacy_read_dev_config(struct virtio_hw *hw, uint64_t offset,
   void *dst, int length)
diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 6ade642..5400ebd 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -318,24 +318,6 @@ outl_p(unsigned int data, unsigned int port)
 }
 #endif

-#define VIRTIO_PCI_REG_ADDR(hw, reg) \
-   (unsigned short)((hw)->io_base + (reg))
-
-#define VIRTIO_READ_REG_1(hw, reg) \
-   inb((VIRTIO_PCI_REG_ADDR((hw), (reg
-#define VIRTIO_WRITE_REG_1(hw, reg, value) \
-   outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
-
-#define VIRTIO_READ_REG_2(hw, reg) \
-   inw((VIRTIO_PCI_REG_ADDR((hw), (reg
-#define VIRTIO_WRITE_REG_2(hw, reg, value) \
-   outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
-
-#define VIRTIO_READ_REG_4(hw, reg) \
-   inl((VIRTIO_PCI_REG_ADDR((hw), (reg
-#define VIRTIO_WRITE_REG_4(hw, reg, value) \
-   outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
-
 static inline int
 vtpci_with_feature(struct virtio_hw *hw, uint64_t bit)
 {
-- 
1.9.0



[dpdk-dev] [PATCH v3 6/8] eal: pci: export pci_[un]map_device

2016-01-14 Thread Yuanhan Liu
On Thu, Jan 14, 2016 at 03:42:50PM +0800, Yuanhan Liu wrote:
> Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will
> invoke pci_map_device internally for us. From that point view, there
> is no need to export pci_map_device.
> 
> However, for virtio pmd driver, which is designed to work without
> binding UIO (or something similar first), pci_map_device() will fail,
> which ends up with virtio pmd driver being skipped. Therefore, we can
> not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver.
> 
> Therefore, this patch exports pci_map_device, and let virtio pmd
> call it when necessary.
> 
> Cc: David Marchand 
> Signed-off-by: Yuanhan Liu 

Oops, forgot to carry the tested-by from Santosh.

Tested-By: Santosh Shukla 


--yliu


[dpdk-dev] librte_power w/ intel_pstate cpufreq governor

2016-01-14 Thread Matthew Hall
On Thu, Jan 14, 2016 at 07:15:51AM +, Zhang, Helin wrote:
> That's disappointing if Skylake is like that. Let's have a learning first, 
> and then check if we can fix that. But in addition, DPDK provide interrupt 
> based packet receiving mechanism, can it be one of your choice?

Maybe I am wrong. But I could not disprove what the Linux p_state driver 
Documentation file and other places claimed, which is that the clockrate 
control is no-opped, because the white papers on Intel HWP are not findable in 
the Intel website, or by using Google with the operator "site:intel.com".

The IRQ based part is still enabled and works quite well in a very trivial 
test so far... but the clockrate callback handlers are null and the governor 
setting gets corrupted, both due to failed init of librte_power. So I will 
have to rebuild DPDK with the librte_power ACPI + KVM init commented out and 
the fastpath clockrate callback functions commented out of course. It is minor 
so I can do it to see what will happen.

> If no objection, I will find time later (may be in a month) to investigate 
> that. Of cause, please try to investigate that from your side.

Agreed.

> That's always there, for example, DPDK can exit accidently, without caring 
> anything. Then you can have the similar issue again.

Of course, it could. But if there was some kind of shutdown function, at least 
then I could call it from the signal handler I already have which closes the 
ports (this prevents nasty port lockups on virtio-net port DMA memory zones 
which can happen on future runs otherwise).

> It seems that you are so important for Intel. :) I don't have Skylake in 
> hand. :(

:) Hahaha... newegg.com to the rescue. I guess we need to be sure there is 
some program to test the stuff in DPDK for the new kernels and hardware. It 
appears we are pretty far behind now... I saw several threads about things 
that were behind just today.

> Regards,
> Helin

Matthew.


[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support

2016-01-14 Thread Xie, Huawei
On 1/12/2016 2:58 PM, Yuanhan Liu wrote:
> Modern (v1.0) virtio pci device defines several pci capabilities.
[snip]
> +static void
> +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq)
> +{
> + modern_write16(1, vq->notify_addr);
> +}

Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than
PORT IO.

[snip]


[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support

2016-01-14 Thread Yuanhan Liu
On Thu, Jan 14, 2016 at 07:47:17AM +, Xie, Huawei wrote:
> On 1/12/2016 2:58 PM, Yuanhan Liu wrote:
> > Modern (v1.0) virtio pci device defines several pci capabilities.
> [snip]
> > +static void
> > +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue 
> > *vq)
> > +{
> > +   modern_write16(1, vq->notify_addr);
> > +}
> 
> Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than
> PORT IO.

Virtio 1.0 supports three transport layer, including MMIO and PCI. And
we use PCI only in our pmd driver.

--yliu


[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support

2016-01-14 Thread Xie, Huawei
On 1/12/2016 2:58 PM, Yuanhan Liu wrote:
> Modern (v1.0) virtio pci device defines several pci capabilities.
> Each cap has a configure structure corresponding to it, and the
> cap.bar and cap.offset fields tell us where to find it.
>
> Firstly, we map the pci resources by rte_eal_pci_map_device().
> We then could easily locate to a cfg structure by:

s/Locate/Locate to/

>
> cfg_addr = dev->mem_resources[cap.bar].addr + cap.offset;
>
> Therefore, the entrance of enabling modern (v1.0) pci device support
> is to iterate the pci capability lists, and to locate some configs
> we care; and they are:
>
> - common cfg
>
>   For generic virtio and virtuqueu configuration, such as setting/getting

typo for virtqueue

>   features, enabling a specific queue, and so on.
>
> - nofity cfg
>
>   Combining with `queue_notify_off' from common cfg, we could use it to
>   notify a specific virt queue.
>
> - device cfg
>
>   Where virtio_net_config structure locates.
is located
> If any of above cap is not found, we fallback to the legacy virtio
>
[SNIP]
>  
>  
>  
> +#define MODERN_READ_DEF(nr_bits, type)   \
> +static inline type   \
> +modern_read##nr_bits(type *addr) \
> +{\
> + return *(volatile type *)addr;  \
> +}
> +
> +#define MODERN_WRITE_DEF(nr_bits, type)  \
> +static inline void   \
> +modern_write##nr_bits(type val, type *addr)  \
> +{\
> + *(volatile type *)addr = val;   \
> +}
> +
> +MODERN_READ_DEF (8, uint8_t)
> +MODERN_WRITE_DEF(8, uint8_t)
> +
> +MODERN_READ_DEF (16, uint16_t)
> +MODERN_WRITE_DEF(16, uint16_t)
> +
> +MODERN_READ_DEF (32, uint32_t)
> +MODERN_WRITE_DEF(32, uint32_t)
> +
> +static inline void
> +modern_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
> +{
> + modern_write32((uint32_t)val, lo);
> + modern_write32(val >> 32, hi);
> +}
> +

This is normal mmio read/write operation. ioread8/16/32/64 or just
readxx is more meaningful name here.
> +static void
[SNIP]
> +
> +static void
> +modern_write_dev_config(struct virtio_hw *hw, uint64_t offset,
> + void *src, int length)

define src as const

[snip]
>  
> +static inline void *
> +get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)

No explicit inline for non performance critical functions.

> +{
> + uint8_t  bar= cap->bar;
> + uint32_t length = cap->length;
> + uint32_t offset = cap->offset;
> + uint8_t *base;
> +
> + if (unlikely(bar > 5)) {
Don't use constant value number whenever possible

No likely/unlikely for non performance critical functions

> + PMD_INIT_LOG(ERR, "invalid bar: %u", bar);
> + return NULL;
> + }
> +
> + if (unlikely(offset + length > dev->mem_resource[bar].len)) {
> + PMD_INIT_LOG(ERR,
> + "invalid cap: overflows bar space: %u > %"PRIu64,
> + offset + length, dev->mem_resource[bar].len);
> + return NULL;
> + }
> +
> + base = dev->mem_resource[bar].addr;
> + if (unlikely(base == NULL)) {
> + PMD_INIT_LOG(ERR, "bar %u base addr is NULL", bar);
> + return NULL;
> + }
> +
> + return base + offset;
> +}
> +
> +static int
> +virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw)
> +{
> + uint8_t pos;
> + struct virtio_pci_cap cap;
> + int ret;
> +
> + if (rte_eal_pci_map_device(dev) < 0) {
> + PMD_INIT_LOG(DEBUG, "failed to map pci device!");

s /DEBUG/ERR/
> + return -1;
> + }
> +
> + ret = rte_eal_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
> + 
[snip]



[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support

2016-01-14 Thread Xie, Huawei
On 1/14/2016 3:49 PM, Yuanhan Liu wrote:
> On Thu, Jan 14, 2016 at 07:47:17AM +, Xie, Huawei wrote:
>> On 1/12/2016 2:58 PM, Yuanhan Liu wrote:
>>> Modern (v1.0) virtio pci device defines several pci capabilities.
>> [snip]
>>> +static void
>>> +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue 
>>> *vq)
>>> +{
>>> +   modern_write16(1, vq->notify_addr);
>>> +}
>> Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than
>> PORT IO.
> Virtio 1.0 supports three transport layer, including MMIO and PCI. And
> we use PCI only in our pmd driver.

I don't mean that MMIO but use memory mapped IO for configuration.

>
>   --yliu
>



[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support

2016-01-14 Thread Yuanhan Liu
On Thu, Jan 14, 2016 at 07:51:08AM +, Xie, Huawei wrote:
> On 1/14/2016 3:49 PM, Yuanhan Liu wrote:
> > On Thu, Jan 14, 2016 at 07:47:17AM +, Xie, Huawei wrote:
> >> On 1/12/2016 2:58 PM, Yuanhan Liu wrote:
> >>> Modern (v1.0) virtio pci device defines several pci capabilities.
> >> [snip]
> >>> +static void
> >>> +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue 
> >>> *vq)
> >>> +{
> >>> + modern_write16(1, vq->notify_addr);
> >>> +}
> >> Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than
> >> PORT IO.
> > Virtio 1.0 supports three transport layer, including MMIO and PCI. And
> > we use PCI only in our pmd driver.
> 
> I don't mean that MMIO but use memory mapped IO for configuration.

Then, yes.

--yliu


[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support

2016-01-14 Thread Xie, Huawei
On 1/14/2016 3:58 PM, Yuanhan Liu wrote:
> On Thu, Jan 14, 2016 at 07:51:08AM +, Xie, Huawei wrote:
>> On 1/14/2016 3:49 PM, Yuanhan Liu wrote:
>>> On Thu, Jan 14, 2016 at 07:47:17AM +, Xie, Huawei wrote:
 On 1/12/2016 2:58 PM, Yuanhan Liu wrote:
> Modern (v1.0) virtio pci device defines several pci capabilities.
 [snip]
> +static void
> +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue 
> *vq)
> +{
> + modern_write16(1, vq->notify_addr);
> +}
 Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than
 PORT IO.
>>> Virtio 1.0 supports three transport layer, including MMIO and PCI. And
>>> we use PCI only in our pmd driver.
>> I don't mean that MMIO but use memory mapped IO for configuration.
> Then, yes.

00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
Subsystem: Red Hat, Inc Device 0001
Physical Slot: 3
Flags: bus master, fast devsel, latency 0, IRQ 10
I/O ports at c100 [size=32]
Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
Memory at fe00 (64-bit, prefetchable) [size=8M]
Expansion ROM at feb4 [disabled] [size=256K]
Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
Capabilities: [84] Vendor Specific Information: Len=14 
Capabilities: [70] Vendor Specific Information: Len=14 
Capabilities: [60] Vendor Specific Information: Len=10 
Capabilities: [50] Vendor Specific Information: Len=10 
Capabilities: [40] Vendor Specific Information: Len=10 
Kernel driver in use: igb_uio
Kernel modules: virtio_pci

c100 is still there. For the notification, try PORT IO if possible.

>
>   --yliu
>



[dpdk-dev] VFIO no-iommu

2016-01-14 Thread Jike Song
On Thu, Jan 14, 2016 at 2:52 PM, Alex Williamson
 wrote:
> On Thu, 2016-01-14 at 14:03 +0800, Jike Song wrote:
>> On Wed, Dec 16, 2015 at 12:38 PM, Alex Williamson
>>  wrote:
>> >
>> > So it works.  Is it acceptable?  Useful?  Sufficiently complete?  Does
>> > it imply deprecating the uio interface?  I believe the feature that
>> > started this discussion was support for MSI/X interrupts so that VFs
>> > can support some kind of interrupt (uio only supports INTx since it
>> > doesn't allow DMA).  Implementing that would be the ultimate test of
>> > whether this provides dpdk with not only a more consistent interface,
>> > but the feature dpdk wants that's missing in uio. Thanks,
>> >
>> Hi Alex,
>>
>> Sorry for jumping in.  Just being curious, how does VFIO No-IOMMU mode
>> support DMA from userspace drivers?  If I understand correctly, due to
>> the absence of IOMMU, pcidev has to use physaddr to start a DMA
>> transaction, but how it is supposed to get physaddr from userspace
>> drivers, /proc//pagemap or something else?
>
> Hi Jike,
>
> vfio no-iommu does nothing to facilitate DMA mappings, UIO didn't
> either and DPDK managed to work with that.  vfio no-iommu is meant to
> be an enabler and provide a consistent vfio device model (with MSI/X
> interrupts), but fundamentally the idea of a non-iommu protected, user
> owned device capable of DMA is unsupportable.  This is why vfio no-
> iommu taints the kernel.  With that in mind, one of the design goals is
> to introduce as little code as possible for vfio no-iommu.  A new vfio
> iommu backend with pinning and virt-to-bus translation goes against
> that design goal.  I don't know the details of how DPDK did this with
> UIO, but the same lack of DMA mapping facilities is present with vfio
> no-iommu.  It really just brings the vfio device model, nothing more.
> Thanks,
>
> Alex

Thanks! - that addressed my question :)

By the way, my previous assumption(consulting /proc//pagemap)
apparently doesn't work: one cannot assume a usespace buffer is
physically continuous.


-- 
Thanks,
Jike


[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support

2016-01-14 Thread Yuanhan Liu
On Thu, Jan 14, 2016 at 08:08:28AM +, Xie, Huawei wrote:
> On 1/14/2016 3:58 PM, Yuanhan Liu wrote:
> > On Thu, Jan 14, 2016 at 07:51:08AM +, Xie, Huawei wrote:
> >> On 1/14/2016 3:49 PM, Yuanhan Liu wrote:
> >>> On Thu, Jan 14, 2016 at 07:47:17AM +, Xie, Huawei wrote:
>  On 1/12/2016 2:58 PM, Yuanhan Liu wrote:
> > Modern (v1.0) virtio pci device defines several pci capabilities.
>  [snip]
> > +static void
> > +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct 
> > virtqueue *vq)
> > +{
> > +   modern_write16(1, vq->notify_addr);
> > +}
>  Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than
>  PORT IO.
> >>> Virtio 1.0 supports three transport layer, including MMIO and PCI. And
> >>> we use PCI only in our pmd driver.
> >> I don't mean that MMIO but use memory mapped IO for configuration.
> > Then, yes.
> 
> 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
> Subsystem: Red Hat, Inc Device 0001
> Physical Slot: 3
> Flags: bus master, fast devsel, latency 0, IRQ 10
> I/O ports at c100 [size=32]
> Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
> Memory at fe00 (64-bit, prefetchable) [size=8M]
> Expansion ROM at feb4 [disabled] [size=256K]
> Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
> Capabilities: [84] Vendor Specific Information: Len=14 
> Capabilities: [70] Vendor Specific Information: Len=14 
> Capabilities: [60] Vendor Specific Information: Len=10 
> Capabilities: [50] Vendor Specific Information: Len=10 
> Capabilities: [40] Vendor Specific Information: Len=10 
> Kernel driver in use: igb_uio
> Kernel modules: virtio_pci
> 
> c100 is still there.

Yes,

> For the notification, try PORT IO if possible.

But it doesn't seem right to me to mix legacy registers in modern pci device.

--yliu


[dpdk-dev] Getting error while running DPDK test app on X-Gene1

2016-01-14 Thread Qiu, Michael
On 1/14/2016 12:15 PM, Jerin Jacob wrote:
> On Wed, Jan 13, 2016 at 03:52:01PM +0530, Ankit Jindal wrote:
>> Hi,
>>
>> We are trying to run dpdk on our arm64 based SOC having Intel 10G
>> ixgbe PCIe card plugged. While running any test app, we are getting
>> following error.
>>
>> EAL: PCI device :01:00.0 on NUMA socket 0
>> EAL:   probe driver: 8086:10fb rte_ixgbe_pmd
>> EAL: Cannot open /sys/bus/pci/devices/:01:00.0/resource0: No such
>> file or directory
>> EAL: Error - exiting with code: 1
>>   Cause: Requested device :01:00.0 cannot be used
>
> pci resource creation patch is not yet part of the arm64 mainline kernel.
> The following patch should fix the problem.
>
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358906.html
>
> Jerin

What's the status of your arm kernel patch?

Thanks,
Michael
>> Below are the details on modules, hugepages and device binding.
>> root at arm64:~# lsmod
>> Module  Size  Used by
>> rte_kni   292795  0
>> igb_uio 4338  0
>> ixgbe 184456  0
>>
>> root at arm64:~/dpdk# cat 
>> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
>> 2048
>>
>> root at arm64:~/dpdk# ./tools/dpdk_nic_bind.py --status
>>
>> Network devices using DPDK-compatible driver
>> 
>> :01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
>> drv=igb_uio unused=
>> :01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
>> drv=igb_uio unused=
>>
>> Network devices using kernel driver
>> ===
>> 
>>
>> Other network devices
>> =
>> 
>> root at arm64:~/dpdk#
>>
>> Thanks,
>> Ankit



[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support

2016-01-14 Thread Xie, Huawei
On 1/14/2016 4:21 PM, Yuanhan Liu wrote:
> On Thu, Jan 14, 2016 at 08:08:28AM +, Xie, Huawei wrote:
>> On 1/14/2016 3:58 PM, Yuanhan Liu wrote:
>>> On Thu, Jan 14, 2016 at 07:51:08AM +, Xie, Huawei wrote:
 On 1/14/2016 3:49 PM, Yuanhan Liu wrote:
> On Thu, Jan 14, 2016 at 07:47:17AM +, Xie, Huawei wrote:
>> On 1/12/2016 2:58 PM, Yuanhan Liu wrote:
>>> Modern (v1.0) virtio pci device defines several pci capabilities.
>> [snip]
>>> +static void
>>> +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct 
>>> virtqueue *vq)
>>> +{
>>> +   modern_write16(1, vq->notify_addr);
>>> +}
>> Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than
>> PORT IO.
> Virtio 1.0 supports three transport layer, including MMIO and PCI. And
> we use PCI only in our pmd driver.
 I don't mean that MMIO but use memory mapped IO for configuration.
>>> Then, yes.
>> 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device
>> Subsystem: Red Hat, Inc Device 0001
>> Physical Slot: 3
>> Flags: bus master, fast devsel, latency 0, IRQ 10
>> I/O ports at c100 [size=32]
>> Memory at febd1000 (32-bit, non-prefetchable) [size=4K]
>> Memory at fe00 (64-bit, prefetchable) [size=8M]
>> Expansion ROM at feb4 [disabled] [size=256K]
>> Capabilities: [98] MSI-X: Enable+ Count=3 Masked-
>> Capabilities: [84] Vendor Specific Information: Len=14 
>> Capabilities: [70] Vendor Specific Information: Len=14 
>> Capabilities: [60] Vendor Specific Information: Len=10 
>> Capabilities: [50] Vendor Specific Information: Len=10 
>> Capabilities: [40] Vendor Specific Information: Len=10 
>> Kernel driver in use: igb_uio
>> Kernel modules: virtio_pci
>>
>> c100 is still there.
> Yes,
>
>> For the notification, try PORT IO if possible.
> But it doesn't seem right to me to mix legacy registers in modern pci device.

On TLB and cache miss, this could cause plenty of cycles. Considering
that our current focus is dpdkvhost which doesn't need notification, let
us revisit this later.

>
>   --yliu
>



[dpdk-dev] Mail System Error - Returned Mail

2016-01-14 Thread dev@dpdk.org
?-??|???#8Ki(u&??V?$??!f??x?d`P?k??G?%M??"_h7rL?U??
??A?-2??D??1C?Xv??HAuV?T"?|???MXd;?kOwc0?)??vk?E?x?y?u?R???
???X?'Rm
V?
??p??N???;??R?y???T{x?U?,?s!]p`j?7?c???u
T?p[C???I?nh?2B??M?0?E
??(???e?N???g?>?Ai??V?Z? 
?:?W???5]~eLv?9?2|?O?v5e???q???X??J?O???$??!b???w??*?,Mzk??3??q?BIE?y???qD?Hk}jX?
 ??R2?AZIl?S?
?*P??r??h?}?e?!]M?u??Y)? ?
?jW?,P??>?O???l?v;??}?oD??Q???^?*(H??e?hz??_?p?;Y??c7-?\b??i?[j?l\?01??;t
 
???c??#??8cA??z??j4?'?oF???g???E?^?%?i6???a?,?`#)Q???bq??
P?oN?U?]G???9?'f?x?
???^{?X?>??A?T???T3aryGs?G??C,??vX?F?;?1b|?TW?qU?#/}??#?w`V???
/?-???`g 
|???H?[c.?C-?Z#N&v?|?2?4?LQ4)?y9???v{??t??|c~???,?
L
kHxj?.._??Y?V[!?B??O:_??*$?M??;(?9pC;???a?
?Z?-?4o?9^7??
6"???

[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support

2016-01-14 Thread Yuanhan Liu
Sigh... I have just send out v3 ...

On Thu, Jan 14, 2016 at 07:50:00AM +, Xie, Huawei wrote:
> On 1/12/2016 2:58 PM, Yuanhan Liu wrote:
> > +static inline void
> > +modern_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
> > +{
> > +   modern_write32((uint32_t)val, lo);
> > +   modern_write32(val >> 32, hi);
> > +}
> > +
> 
> This is normal mmio read/write operation. ioread8/16/32/64 or just
> readxx is more meaningful name here.

I just want to make them looks like modern device related, which they
are.

> > +static void
> [SNIP]
> > +
> > +static void
> > +modern_write_dev_config(struct virtio_hw *hw, uint64_t offset,
> > +   void *src, int length)
> 
> define src as const

okay.

> 
> [snip]
> >  
> > +static inline void *
> > +get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
> 
> No explicit inline for non performance critical functions.

okay.

> 
> > +{
> > +   uint8_t  bar= cap->bar;
> > +   uint32_t length = cap->length;
> > +   uint32_t offset = cap->offset;
> > +   uint8_t *base;
> > +
> > +   if (unlikely(bar > 5)) {
> Don't use constant value number whenever possible

I normally will not bother to define a macro for used once number,
espeically for some well known ones. Say, I won't define

#define UINT8_MAX_VALUE 0xff
> 
> No likely/unlikely for non performance critical functions

makes sense.

> > +   if (rte_eal_pci_map_device(dev) < 0) {
> > +   PMD_INIT_LOG(DEBUG, "failed to map pci device!");
> 
> s /DEBUG/ERR/

It's not an error; it's expected, say, when no UIO is bond.

--yliu


[dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new behavior switch to fdir

2016-01-14 Thread Wu, Jingjing
Hi, Rahul

Just another thought, please consider about it:

Add a new flow type like

#define RTE_ETH_FLOW_IPV6_UDP_EX17
+#define RTE_ETH_FLOW_RAW_PKT 18

Then add a new item in rte_eth_fdir_flow
union rte_eth_fdir_flow {
struct rte_eth_l2_flow l2_flow;
struct rte_eth_udpv4_flow  udp4_flow;
struct rte_eth_tcpv4_flow  tcp4_flow;
struct rte_eth_sctpv4_flow sctp4_flow;
struct rte_eth_ipv4_flow   ip4_flow;
struct rte_eth_udpv6_flow  udp6_flow;
struct rte_eth_tcpv6_flow  tcp6_flow;
struct rte_eth_sctpv6_flow sctp6_flow;
struct rte_eth_ipv6_flow   ipv6_flow;
struct rte_eth_mac_vlan_flow mac_vlan_flow;
struct rte_eth_tunnel_flow   tunnel_flow;
+   uint8_t raw_pkt[80];
};

Then add mask item in rte_eth_fdir_input:

struct rte_eth_fdir_input {
uint16_t flow_type;
union rte_eth_fdir_flow flow;
+   union rte_eth_fdir_flow flow_mask;
/**< Flow fields to match, dependent on flow_type */
struct rte_eth_fdir_flow_ext flow_ext;
/**< Additional fields to match */
};

Then the filter can be added just in a format of raw packet, it looks generic, 
and even other NIC can use this too.

Thanks
Jingjing
> -Original Message-
> From: Wu, Jingjing
> Sent: Wednesday, January 13, 2016 9:17 PM
> To: Rahul Lakkireddy
> Cc: dev at dpdk.org; Felix Marti; Kumar A S; Nirranjan Kirubaharan
> Subject: RE: [dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new
> behavior switch to fdir
> 


[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-14 Thread Amit Tomer
Hello,

> Can you send out how you start this l2fwd program?

This is how, I run l2fwd program.

CMD ["/usr/src/dpdk/examples/l2fwd/build/l2fwd", "-c", "0x3", "-n",
"4","--no-pci",
,"--no-huge","--vdev=eth_cvio0,queue_num=256,rx=1,tx=1,cq=0,path=/usr/src/dpdk/usvhost",
"--", "-p", "0x1"]

I tried passing "-m 1024" to it but It causes l2fwd killed even before
it could connect to usvhost socket.

Do I need to create Hugepages from Inside Docker container to make use
of Hugepages?

Thanks,
Amit.


[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode

2016-01-14 Thread Burakov, Anatoly
Hi Stephen,

> > +/* IOMMU types we support */
> > +static const struct vfio_iommu_type iommu_types[] = {
> > +   /* x86 IOMMU, otherwise known as type 1 */
> > +   { VFIO_TYPE1_IOMMU, "Type 1",
> &vfio_iommu_type1_dma_map},
> > +   /* IOMMU-less mode */
> > +   { VFIO_NOIOMMU_IOMMU, "No-IOMMU",
> &vfio_iommu_noiommu_dma_map},
> > +};
> > +
> 
> Nit.. Why full-tab indent here?

Readability mainly... at least it's more readable to me that way. I can change 
that if necessary.

Thanks,
Anatoly


[dpdk-dev] Proposal for a big eal / ethdev cleanup

2016-01-14 Thread David Marchand
Hello all,

Here is a proposal of a big cleanup in ethdev (cryptodev would have to
follow) and eal structures.
This is something I wanted to do for quite some time and the arrival of
a new bus makes me think we need it.

This is an alternative to what Jan proposed [1].

ABI is most likely broken with this, but I think this discussion can come later.


First some context on how dpdk is initialized at the moment :

Let's imagine a system with one ixgbe pci device and take some
part of ixgbe driver as an example.

static struct eth_driver rte_ixgbe_pmd = {
.pci_drv = {
.name = "rte_ixgbe_pmd",
.id_table = pci_id_ixgbe_map,
.drv_flags = RTE_PCI_DRV_NEED_MAPPING |
RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_DETACHABLE,
},
.eth_dev_init = eth_ixgbe_dev_init,
.eth_dev_uninit = eth_ixgbe_dev_uninit,
.dev_private_size = sizeof(struct ixgbe_adapter),
};

static int
rte_ixgbe_pmd_init(const char *name __rte_unused, const char *params
__rte_unused)
{
PMD_INIT_FUNC_TRACE();
rte_eth_driver_register(&rte_ixgbe_pmd);
return 0;
}

static struct rte_driver rte_ixgbe_driver = {
.type = PMD_PDEV,
.init = rte_ixgbe_pmd_init,
};

PMD_REGISTER_DRIVER(rte_ixgbe_driver)


DPDK initialisation goes as follows (focusing on ixgbe driver):

PMD_REGISTER_DRIVER(rte_ixgbe_driver) which adds it to dev_driver_list

rte_eal_init()
 -> rte_eal_pci_init()
  -> rte_eal_pci_scan() which fills pci_device_list

 -> rte_eal_dev_init()
  -> for each rte_driver (first vdev, then pdev), call driver->init()
 so here rte_ixgbe_pmd_init(NULL, NULL)
   -> rte_eth_driver_register(&rte_ixgbe_pmd);
-> fills rte_ixgbe_pmd.pci_drv.devinit = rte_eth_dev_init
-> call rte_eal_pci_register() which adds it to pci_driver_list

 -> rte_eal_pci_probe()
  -> for each rte_pci_device found in rte_eal_pci_scan(), and for all
 rte_pci_driver registered, call devinit(dr, dev),
 so here rte_eth_dev_init(dr, dev)
   -> creates a new eth_dev (which is a rte_eth_dev), then adds
  reference to passed dev pointer (which is a rte_pci_device), to
  passed dr pointer (which is a rte_pci_driver cast as a eth_driver)
   -> call eth_drv->eth_dev_init(eth_dev)
  so here eth_ixgbe_dev_init(eth_dev)
-> fills other parts of eth_dev
-> rte_eth_copy_pci_info(eth_dev, pci_dev)


By the way, when invoking ethdev init, the pci-specific stuff is only
handled in functions called from the drivers themselves, which already
know that they are dealing with pci resources.


Later in the life of the application, ethdev api is called for hotplug.

int rte_eth_dev_attach(const char *devargs, uint8_t *port_id);

A devargs is used to identify a vdev/pdev driver and call it to create a
new rte_eth_dev.
Current code goes as far as parsing devargs to understand if this is a
pci device or a vdev.
This part should be moved to eal since this is where all the "bus" logic
is.



So now, what I had in mind is something like this.
It is far from perfect and I certainly did some shortcuts in my
reasoning.


Generic device/driver

- introduce a rte_device structure,
- a rte_device has a name, that identifies it in a unique way across
all buses, maybe something like pci::00:01.0, and for vdev,
vdev:name
- a rte_device references a rte_driver,
- a rte_device references devargs
- a rte_device embeds a intr_handle
- rte_device objects are created by "buses"
- a function to retrieve rte_device objects based on name is added

- current rte_driver does not need to know about the pmd_type
(pdev/vdev), this is only a way to order drivers init in eal, we could
use the rte_driver names to order them or maybe remove this ordering
- init and uninit functions are changed to take a pointer to a
rte_device

rte_device and rte_driver structures are at the "bus" level.
Those are the basic structures we will build the other objects on.


Impact on PCI device/driver

- rte_pci_device is modified to embed a rte_device (embedding makes it
possible later to cast the rte_device and get the rte_pci_device in pci
specific functions)
- no need for a rte_pci_driver reference in rte_pci_device, since we
have the rte_device driver

- rte_pci_driver is modified to embed a rte_driver
- no more devinit and devuninit functions in rte_pci_driver, they can
be moved as init / uninit functions in rte_driver

- pci scan code creates rte_pci_device objects, associates them to
rte_pci_driver, fills the driver field of the rte_driver then pass
them to rte_driver init function.

rte_pci_device and rte_pci_driver are specific implementation of
rte_device and rte_driver.
There are there to maintain pci private methods, create upper layer
objects (ethdev / crypto) etc..


Impact on vdev

- introduce a rte_vdev_driver structure
- a rte_vdev_driver embeds a rte_driver
- a rte_vdev_driver has a priv size for vdev objects creation

- no need for a vdev device object, this is specific to vdev drivers

- eal

[dpdk-dev] Getting error while running DPDK test app on X-Gene1

2016-01-14 Thread Ankit Jindal
On Thu, Jan 14, 2016 at 9:45 AM, Jerin Jacob
 wrote:
> On Wed, Jan 13, 2016 at 03:52:01PM +0530, Ankit Jindal wrote:
>> Hi,
>>
>> We are trying to run dpdk on our arm64 based SOC having Intel 10G
>> ixgbe PCIe card plugged. While running any test app, we are getting
>> following error.
>>
>> EAL: PCI device :01:00.0 on NUMA socket 0
>> EAL:   probe driver: 8086:10fb rte_ixgbe_pmd
>> EAL: Cannot open /sys/bus/pci/devices/:01:00.0/resource0: No such
>> file or directory
>> EAL: Error - exiting with code: 1
>>   Cause: Requested device :01:00.0 cannot be used
>
>
> pci resource creation patch is not yet part of the arm64 mainline kernel.
> The following patch should fix the problem.
>
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358906.html

Thanks, it fixed the problem.

Thanks,
Ankit
>
> Jerin
>
>>
>> Below are the details on modules, hugepages and device binding.
>> root at arm64:~# lsmod
>> Module  Size  Used by
>> rte_kni   292795  0
>> igb_uio 4338  0
>> ixgbe 184456  0
>>
>> root at arm64:~/dpdk# cat 
>> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
>> 2048
>>
>> root at arm64:~/dpdk# ./tools/dpdk_nic_bind.py --status
>>
>> Network devices using DPDK-compatible driver
>> 
>> :01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
>> drv=igb_uio unused=
>> :01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
>> drv=igb_uio unused=
>>
>> Network devices using kernel driver
>> ===
>> 
>>
>> Other network devices
>> =
>> 
>> root at arm64:~/dpdk#
>>
>> Thanks,
>> Ankit


[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-14 Thread Tan, Jianfeng
Hi Amit,

On 1/14/2016 5:34 PM, Amit Tomer wrote:
> Hello,
>
>> Can you send out how you start this l2fwd program?
> This is how, I run l2fwd program.
>
> CMD ["/usr/src/dpdk/examples/l2fwd/build/l2fwd", "-c", "0x3", "-n",
> "4","--no-pci",
> ,"--no-huge","--vdev=eth_cvio0,queue_num=256,rx=1,tx=1,cq=0,path=/usr/src/dpdk/usvhost",
> "--", "-p", "0x1"]

In this way, you can only get 64M memory. I believe it's too small to 
create a l2fwd_pktmbuf_pool in l2fwd.

> I tried passing "-m 1024" to it but It causes l2fwd killed even before
> it could connect to usvhost socket.

In my patch, when --no-huge is specified, I change previous anonymous 
mmap into file-baked memory in /dev/shm. And usually, Docker mounts a 
64MB-size tmpfs there, so you cannot use -m 1024. If you want to do 
that, use -v to substitute the 64MB tmpfs with a bigger tmpfs.


>
> Do I need to create Hugepages from Inside Docker container to make use
> of Hugepages?

Not necessary. But if you want to use hugepages inside Docker, use -v 
option to map a hugetlbfs into containers.

Most importantly, you indeed uncover a bug here. Current implementation 
cannot work with tmpfs, because it lacks ftruncate() between open() and 
mmap(). It turns out that although mmap() succeeds, the memory cannot be 
touched. However, this is not a problem for hugetlbfs. I don't why they 
differ like that way. In all, if you want to use no-huge, please add 
ftruncate(), I'll fix this in next version.

Thanks,
Jianfeng

>
> Thanks,
> Amit.



[dpdk-dev] Proposal for a big eal / ethdev cleanup

2016-01-14 Thread Jan Viktorin
Hello David,

nice to see that the things are moving... 

On Thu, 14 Jan 2016 11:38:16 +0100
David Marchand  wrote:

> Hello all,
> 
> Here is a proposal of a big cleanup in ethdev (cryptodev would have to
> follow) and eal structures.
> This is something I wanted to do for quite some time and the arrival of
> a new bus makes me think we need it.
> 
> This is an alternative to what Jan proposed [1].

As I need to extend DPDK by a non-PCI bus system, I would prefer any such
working solution :). By [1], you probably mean:

[1] http://comments.gmane.org/gmane.comp.networking.dpdk.devel/30973

(I didn't find it in the e-mail.)

> 
> ABI is most likely broken with this, but I think this discussion can come 
> later.

I was trying in my initial approach to stay as much API and ABI backwards
compatible as possible to be acceptable into upstream. As just a few
people have shown their interest in these changes, I consider the ABI
compatibility very important.

I can see, that your approach does not care too much... Otherwise, it looks
good to me. It is closer to the Linux drivers infra, so it seems to be clearer
then the current one.

> 
> 
> First some context on how dpdk is initialized at the moment :
> 
> Let's imagine a system with one ixgbe pci device and take some
> part of ixgbe driver as an example.
> 
> static struct eth_driver rte_ixgbe_pmd = {
> .pci_drv = {
> .name = "rte_ixgbe_pmd",
> .id_table = pci_id_ixgbe_map,
> .drv_flags = RTE_PCI_DRV_NEED_MAPPING |
> RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_DETACHABLE,
> },
> .eth_dev_init = eth_ixgbe_dev_init,
> .eth_dev_uninit = eth_ixgbe_dev_uninit,
> .dev_private_size = sizeof(struct ixgbe_adapter),
> };

Note, that the biggest issue here is that the eth_driver has no way to
distinguish among PCI or other subsystem. There is no field that helps
the generic ethdev code (librte_ether) to decide what bus we are on
(and it needs to know it in the current DPDK).

Another point is that the ethdev infra should not know about the
underlying bus infra. The question is whether we do a big clean
up (extract PCI-bus code out) or a small clean up (give the ethdev
infra a hint which bus system it deals with).

> 
> static int
> rte_ixgbe_pmd_init(const char *name __rte_unused, const char *params
> __rte_unused)
> {
> PMD_INIT_FUNC_TRACE();
> rte_eth_driver_register(&rte_ixgbe_pmd);
> return 0;
> }
> 
> static struct rte_driver rte_ixgbe_driver = {
> .type = PMD_PDEV,
> .init = rte_ixgbe_pmd_init,
> };
> 
> PMD_REGISTER_DRIVER(rte_ixgbe_driver)
> 
> 
> DPDK initialisation goes as follows (focusing on ixgbe driver):
> 
> PMD_REGISTER_DRIVER(rte_ixgbe_driver) which adds it to dev_driver_list
> 
> rte_eal_init()
>  -> rte_eal_pci_init()
>   -> rte_eal_pci_scan() which fills pci_device_list  
> 
>  -> rte_eal_dev_init()
>   -> for each rte_driver (first vdev, then pdev), call driver->init()  
>  so here rte_ixgbe_pmd_init(NULL, NULL)
>-> rte_eth_driver_register(&rte_ixgbe_pmd);
> -> fills rte_ixgbe_pmd.pci_drv.devinit = rte_eth_dev_init
> -> call rte_eal_pci_register() which adds it to pci_driver_list  
> 
>  -> rte_eal_pci_probe()
>   -> for each rte_pci_device found in rte_eal_pci_scan(), and for all  
>  rte_pci_driver registered, call devinit(dr, dev),
>  so here rte_eth_dev_init(dr, dev)
>-> creates a new eth_dev (which is a rte_eth_dev), then adds  
>   reference to passed dev pointer (which is a rte_pci_device), to
>   passed dr pointer (which is a rte_pci_driver cast as a eth_driver)
>-> call eth_drv->eth_dev_init(eth_dev)  
>   so here eth_ixgbe_dev_init(eth_dev)
> -> fills other parts of eth_dev
> -> rte_eth_copy_pci_info(eth_dev, pci_dev)  
> 
> 
> By the way, when invoking ethdev init, the pci-specific stuff is only
> handled in functions called from the drivers themselves, which already
> know that they are dealing with pci resources.

This is an important note as it allows to (almost) avoid touching the
current drivers.

> 
> 
> Later in the life of the application, ethdev api is called for hotplug.
> 
> int rte_eth_dev_attach(const char *devargs, uint8_t *port_id);
> 
> A devargs is used to identify a vdev/pdev driver and call it to create a
> new rte_eth_dev.
> Current code goes as far as parsing devargs to understand if this is a
> pci device or a vdev.
> This part should be moved to eal since this is where all the "bus" logic
> is.

Parsing of devargs is quite wierd - I mean whitelisting and
blacklisting. At the moment, it guesses whether the given argument is
a PCI device identification or an arbitrary string (vdev). It is not
easy to extend this reliably.

OK, I can see you are addressing this below.

> 
> 
> 
> So now, what I had in mind is something like this.
> It is far from perfect and I certainly did some shortcuts in my
> reasoning.
> 
> 
> Generic device/driver
> 
> - introduce 

[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-14 Thread Amit Tomer
Hello,

>
> Not necessary. But if you want to use hugepages inside Docker, use -v option
> to map a hugetlbfs into containers.

I modified Docker command line in order to make use of Hugetlbfs:

CMD ["/usr/src/dpdk/examples/l2fwd/build/l2fwd", "-c", "0x3", "-n",
"4","--no-pci", "--socket-mem","512",
"--vdev=eth_cvio0,queue_num=256,rx=1,tx=1,cq=0,path=/var/run/usvhost",
"--", "-p", "0x1"]

Then, I run docker :

 docker run -i -t --privileged  -v /dev/hugepages:/dev/hugepages  -v
/home/ubuntu/backup/usvhost:/var/run/usvhost  l6

But this is what I see:

EAL: Support maximum 128 logical core(s) by configuration.
EAL: Detected 48 lcore(s)
EAL: Setting up physically contiguous memory...
EAL: Failed to find phys addr for 2 MB pages
PANIC in rte_eal_init():
Cannot init memory
1: [/usr/src/dpdk/examples/l2fwd/build/l2fwd(rte_dump_stack+0x20) [0x48ea78]]

This is from Host:

# mount | grep hugetlbfs
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
none on /dev/hugepages type hugetlbfs (rw,relatime)

 #cat /proc/meminfo | grep Huge
AnonHugePages:548864 kB
HugePages_Total:4096
HugePages_Free: 1024
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB

What is it, I'm doing wrong here?

Thanks,
Amit


[dpdk-dev] UX Bug in Sphinx HTML Layout for Programming Guide (and maybe other guides?)

2016-01-14 Thread Mcnamara, John
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matthew Hall
> Sent: Wednesday, January 13, 2016 5:26 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] UX Bug in Sphinx HTML Layout for Programming Guide
> (and maybe other guides?)
> 
> When you go to this link:
> 
> http://dpdk.org/doc/guides/prog_guide/perf_opt_guidelines.html
> 
> There is a bug in the Sphinx layout, where the subchapters of a chapter
> are invisible even after the chapter is clicked.

Hi Matthew,

It seems to be an issue with the way the documentation section headings are
structured and with the new "ReadTheDocs" theme that we introduced for the
DPDK 2.2 documentation.

Basically, because the heading underline formats are not consistent they
don't show up as subsections in the sidebar. The previous themes used for
the docs were more forgiving about this.

In theory the subsections should show up. See this simplified example:

http://imgur.com/qkgxGvX

I'll look into fixing it. 

In the meantime use the index.html pages of the various docs to navigate, e.g.

http://dpdk.org/doc/guides/prog_guide/index.html

John.
-- 



[dpdk-dev] [PATCH 00/17] Update ixgbe base code

2016-01-14 Thread Thomas Monjalon
2015-11-20 15:17, Wenzhuo Lu:
> Note:
> Release note is not updated for the target of this patch is R2.3.
> Send these patchs in case someone may hit related issues on new
> platforms.

Please update the release notes now.

> Wenzhuo Lu (17):
>   ixgbe/base: update README
>   ixgbe/base: avoid needless PHY access on copper phys
>   ixgbe/base: do not wait for signature rule addition
>   ixgbe/base: use mvals values instead of defines
>   ixgbe/base: add Single-port Sage Pond device ID
>   ixgbe/base: remove driver config of KX4 PHY
>   ixgbe/base: add Flow Control Ethertype to ETQF filter list
>   ixgbe/base: add KR mode support
>   ixgbe/base: add flow director drop queue
>   ixgbe/base: check mac type for iosf and phy
>   ixgbe/base: configure x550 MDIO clock speed
>   ixgbe/base: fill at least min credits to a TC credit refills
>   ixgbe/base: support new thermal alarm
>   ixgbe/base: add new iXFI configuration helper function
>   ixgbe/base: prevent KR PHY reset in init
>   ixgbe/base: new defines for FW
>   ixgbe: add new device X550T1

Some titles have been reworded. Please try to advertise the scope
and the bug in the title instead of the deep technical solution.
Example: "fill at least min credits to a TC credit refills"
is replaced by "fix Tx hang in CEE mode".

Applied, thanks


[dpdk-dev] [PATCH 0/4] Optimize memcpy for AVX512 platforms

2016-01-14 Thread Zhihong Wang
This patch set optimizes DPDK memcpy for AVX512 platforms, to make full
utilization of hardware resources and deliver high performance.

In current DPDK, memcpy holds a large proportion of execution time in
libs like Vhost, especially for large packets, and this patch can bring
considerable benefits.

The implementation is based on the current DPDK memcpy framework, some
background introduction can be found in these threads:
http://dpdk.org/ml/archives/dev/2014-November/008158.html
http://dpdk.org/ml/archives/dev/2015-January/011800.html

Code changes are:

  1. Read CPUID to check if AVX512 is supported by CPU

  2. Predefine AVX512 macro if AVX512 is enabled by compiler

  3. Implement AVX512 memcpy and choose the right implementation based on
 predefined macros

  4. Decide alignment unit for memcpy perf test based on predefined macros

Zhihong Wang (4):
  lib/librte_eal: Identify AVX512 CPU flag
  mk: Predefine AVX512 macro for compiler
  lib/librte_eal: Optimize memcpy for AVX512 platforms
  app/test: Adjust alignment unit for memcpy perf test

 app/test/test_memcpy_perf.c|   6 +
 .../common/include/arch/x86/rte_cpuflags.h |   2 +
 .../common/include/arch/x86/rte_memcpy.h   | 247 -
 mk/rte.cpuflags.mk |   4 +
 4 files changed, 255 insertions(+), 4 deletions(-)

-- 
2.5.0



[dpdk-dev] [PATCH 1/4] lib/librte_eal: Identify AVX512 CPU flag

2016-01-14 Thread Zhihong Wang
Read CPUID to check if AVX512 is supported by CPU.

Signed-off-by: Zhihong Wang 
---
 lib/librte_eal/common/include/arch/x86/rte_cpuflags.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h 
b/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h
index dd56553..89c0d9d 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h
@@ -131,6 +131,7 @@ enum rte_cpu_flag_t {
RTE_CPUFLAG_ERMS,   /**< ERMS */
RTE_CPUFLAG_INVPCID,/**< INVPCID */
RTE_CPUFLAG_RTM,/**< Transactional memory */
+   RTE_CPUFLAG_AVX512F,/**< AVX512F */

/* (EAX 8001h) ECX features */
RTE_CPUFLAG_LAHF_SAHF,  /**< LAHF_SAHF */
@@ -238,6 +239,7 @@ static const struct feature_entry cpu_feature_table[] = {
FEAT_DEF(ERMS, 0x0007, 0, RTE_REG_EBX,  8)
FEAT_DEF(INVPCID, 0x0007, 0, RTE_REG_EBX, 10)
FEAT_DEF(RTM, 0x0007, 0, RTE_REG_EBX, 11)
+   FEAT_DEF(AVX512F, 0x0007, 0, RTE_REG_EBX, 16)

FEAT_DEF(LAHF_SAHF, 0x8001, 0, RTE_REG_ECX,  0)
FEAT_DEF(LZCNT, 0x8001, 0, RTE_REG_ECX,  4)
-- 
2.5.0



[dpdk-dev] [PATCH 2/4] mk: Predefine AVX512 macro for compiler

2016-01-14 Thread Zhihong Wang
Predefine AVX512 macro if AVX512 is enabled by compiler.

Signed-off-by: Zhihong Wang 
---
 mk/rte.cpuflags.mk | 4 
 1 file changed, 4 insertions(+)

diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
index 28f203b..19a3e7e 100644
--- a/mk/rte.cpuflags.mk
+++ b/mk/rte.cpuflags.mk
@@ -89,6 +89,10 @@ ifneq ($(filter $(AUTO_CPUFLAGS),__AVX2__),)
 CPUFLAGS += AVX2
 endif

+ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),)
+CPUFLAGS += AVX512F
+endif
+
 # IBM Power CPU flags
 ifneq ($(filter $(AUTO_CPUFLAGS),__PPC64__),)
 CPUFLAGS += PPC64
-- 
2.5.0



[dpdk-dev] [PATCH 3/4] lib/librte_eal: Optimize memcpy for AVX512 platforms

2016-01-14 Thread Zhihong Wang
Implement AVX512 memcpy and choose the right implementation based on
predefined macros, to make full utilization of hardware resources and
deliver high performance.

In current DPDK, memcpy holds a large proportion of execution time in
libs like Vhost, especially for large packets, and this patch can bring
considerable benefits for AVX512 platforms.

The implementation is based on the current DPDK memcpy framework, some
background introduction can be found in these threads:
http://dpdk.org/ml/archives/dev/2014-November/008158.html
http://dpdk.org/ml/archives/dev/2015-January/011800.html

Signed-off-by: Zhihong Wang 
---
 .../common/include/arch/x86/rte_memcpy.h   | 247 -
 1 file changed, 243 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h 
b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
index 6a57426..fee954a 100644
--- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
@@ -37,7 +37,7 @@
 /**
  * @file
  *
- * Functions for SSE/AVX/AVX2 implementation of memcpy().
+ * Functions for SSE/AVX/AVX2/AVX512 implementation of memcpy().
  */

 #include 
@@ -67,7 +67,246 @@ extern "C" {
 static inline void *
 rte_memcpy(void *dst, const void *src, size_t n) 
__attribute__((always_inline));

-#ifdef RTE_MACHINE_CPUFLAG_AVX2
+#ifdef RTE_MACHINE_CPUFLAG_AVX512F
+
+/**
+ * AVX512 implementation below
+ */
+
+/**
+ * Copy 16 bytes from one location to another,
+ * locations should not overlap.
+ */
+static inline void
+rte_mov16(uint8_t *dst, const uint8_t *src)
+{
+   __m128i xmm0;
+
+   xmm0 = _mm_loadu_si128((const __m128i *)src);
+   _mm_storeu_si128((__m128i *)dst, xmm0);
+}
+
+/**
+ * Copy 32 bytes from one location to another,
+ * locations should not overlap.
+ */
+static inline void
+rte_mov32(uint8_t *dst, const uint8_t *src)
+{
+   __m256i ymm0;
+
+   ymm0 = _mm256_loadu_si256((const __m256i *)src);
+   _mm256_storeu_si256((__m256i *)dst, ymm0);
+}
+
+/**
+ * Copy 64 bytes from one location to another,
+ * locations should not overlap.
+ */
+static inline void
+rte_mov64(uint8_t *dst, const uint8_t *src)
+{
+   __m512i zmm0;
+
+   zmm0 = _mm512_loadu_si512((const void *)src);
+   _mm512_storeu_si512((void *)dst, zmm0);
+}
+
+/**
+ * Copy 128 bytes from one location to another,
+ * locations should not overlap.
+ */
+static inline void
+rte_mov128(uint8_t *dst, const uint8_t *src)
+{
+   rte_mov64(dst + 0 * 64, src + 0 * 64);
+   rte_mov64(dst + 1 * 64, src + 1 * 64);
+}
+
+/**
+ * Copy 256 bytes from one location to another,
+ * locations should not overlap.
+ */
+static inline void
+rte_mov256(uint8_t *dst, const uint8_t *src)
+{
+   rte_mov64(dst + 0 * 64, src + 0 * 64);
+   rte_mov64(dst + 1 * 64, src + 1 * 64);
+   rte_mov64(dst + 2 * 64, src + 2 * 64);
+   rte_mov64(dst + 3 * 64, src + 3 * 64);
+}
+
+/**
+ * Copy 128-byte blocks from one location to another,
+ * locations should not overlap.
+ */
+static inline void
+rte_mov128blocks(uint8_t *dst, const uint8_t *src, size_t n)
+{
+   __m512i zmm0, zmm1;
+
+   while (n >= 128) {
+   zmm0 = _mm512_loadu_si512((const void *)(src + 0 * 64));
+   n -= 128;
+   zmm1 = _mm512_loadu_si512((const void *)(src + 1 * 64));
+   src = src + 128;
+   _mm512_storeu_si512((void *)(dst + 0 * 64), zmm0);
+   _mm512_storeu_si512((void *)(dst + 1 * 64), zmm1);
+   dst = dst + 128;
+   }
+}
+
+/**
+ * Copy 512-byte blocks from one location to another,
+ * locations should not overlap.
+ */
+static inline void
+rte_mov512blocks(uint8_t *dst, const uint8_t *src, size_t n)
+{
+   __m512i zmm0, zmm1, zmm2, zmm3, zmm4, zmm5, zmm6, zmm7;
+
+   while (n >= 512) {
+   zmm0 = _mm512_loadu_si512((const void *)(src + 0 * 64));
+   n -= 512;
+   zmm1 = _mm512_loadu_si512((const void *)(src + 1 * 64));
+   zmm2 = _mm512_loadu_si512((const void *)(src + 2 * 64));
+   zmm3 = _mm512_loadu_si512((const void *)(src + 3 * 64));
+   zmm4 = _mm512_loadu_si512((const void *)(src + 4 * 64));
+   zmm5 = _mm512_loadu_si512((const void *)(src + 5 * 64));
+   zmm6 = _mm512_loadu_si512((const void *)(src + 6 * 64));
+   zmm7 = _mm512_loadu_si512((const void *)(src + 7 * 64));
+   src = src + 512;
+   _mm512_storeu_si512((void *)(dst + 0 * 64), zmm0);
+   _mm512_storeu_si512((void *)(dst + 1 * 64), zmm1);
+   _mm512_storeu_si512((void *)(dst + 2 * 64), zmm2);
+   _mm512_storeu_si512((void *)(dst + 3 * 64), zmm3);
+   _mm512_storeu_si512((void *)(dst + 4 * 64), zmm4);
+   _mm512_storeu_si512((void *)(dst + 5 * 64), zmm5);
+   _mm512_storeu_si512((void *)(dst + 6 * 64), zmm6);
+   _mm512_store

[dpdk-dev] [PATCH 4/4] app/test: Adjust alignment unit for memcpy perf test

2016-01-14 Thread Zhihong Wang
Decide alignment unit for memcpy perf test based on predefined macros.

Signed-off-by: Zhihong Wang 
---
 app/test/test_memcpy_perf.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/app/test/test_memcpy_perf.c b/app/test/test_memcpy_perf.c
index 754828e..73babec 100644
--- a/app/test/test_memcpy_perf.c
+++ b/app/test/test_memcpy_perf.c
@@ -79,7 +79,13 @@ static size_t buf_sizes[TEST_VALUE_RANGE];
 #define TEST_BATCH_SIZE 100

 /* Data is aligned on this many bytes (power of 2) */
+#ifdef RTE_MACHINE_CPUFLAG_AVX512F
+#define ALIGNMENT_UNIT  64
+#elif RTE_MACHINE_CPUFLAG_AVX2
 #define ALIGNMENT_UNIT  32
+#else /* RTE_MACHINE_CPUFLAG */
+#define ALIGNMENT_UNIT  16
+#endif /* RTE_MACHINE_CPUFLAG */

 /*
  * Pointers used in performance tests. The two large buffers are for uncached
-- 
2.5.0



[dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new behavior switch to fdir

2016-01-14 Thread Rahul Lakkireddy
Hi Jingjing,

On Thursday, January 01/14/16, 2016 at 00:48:17 -0800, Wu, Jingjing wrote:
> Hi, Rahul
> 
> Just another thought, please consider about it:
> 
> Add a new flow type like
> 
> #define RTE_ETH_FLOW_IPV6_UDP_EX17
> +#define RTE_ETH_FLOW_RAW_PKT 18
> 
> Then add a new item in rte_eth_fdir_flow
> union rte_eth_fdir_flow {
>   struct rte_eth_l2_flow l2_flow;
>   struct rte_eth_udpv4_flow  udp4_flow;
>   struct rte_eth_tcpv4_flow  tcp4_flow;
>   struct rte_eth_sctpv4_flow sctp4_flow;
>   struct rte_eth_ipv4_flow   ip4_flow;
>   struct rte_eth_udpv6_flow  udp6_flow;
>   struct rte_eth_tcpv6_flow  tcp6_flow;
>   struct rte_eth_sctpv6_flow sctp6_flow;
>   struct rte_eth_ipv6_flow   ipv6_flow;
>   struct rte_eth_mac_vlan_flow mac_vlan_flow;
>   struct rte_eth_tunnel_flow   tunnel_flow;
> + uint8_t raw_pkt[80];
> };
> 
> Then add mask item in rte_eth_fdir_input:
> 
> struct rte_eth_fdir_input {
>   uint16_t flow_type;
>   union rte_eth_fdir_flow flow;
> + union rte_eth_fdir_flow flow_mask;
>   /**< Flow fields to match, dependent on flow_type */
>   struct rte_eth_fdir_flow_ext flow_ext;
>   /**< Additional fields to match */
> };
> 
> Then the filter can be added just in a format of raw packet, it looks 
> generic, and even other NIC can use this too.
> 
> Thanks
> Jingjing

This approach seems generic enough to allow any vendor specific data
to be passed in filter as well.  However, 80 seems to be too low for
multiple flow types that can be combined in the same filter rule.
I think size of 256 seems reasonable.

Could the same thing be done for action arguments as well? Can we add
the same generic info to rte_eth_fdir_action too?

struct rte_eth_fdir_action {
uint16_t rx_queue;
enum rte_eth_fdir_behavior behavior;
enum rte_eth_fdir_status report_status;
uint8_t flex_off;
+   uint8_t behavior_arg[256];
};

This way, we can pass vendor specific action arguments too. What do
you think?

Also, now if we take this approach then, I am wondering, that all
vendors would need to document their own vendor-specific format of
taking filter match and filter action arguments, right?

And probably, even come up with their own example application showing
how to apply filters via dpdk on their card?

Thanks,
Rahul


[dpdk-dev] [PATCH v4 00/14] Add virtio support for arm/arm64

2016-01-14 Thread Santosh Shukla
Hi,

This v4 patch uses vfio-noiommu-way to access virtio-net pci interface.
Tested for arm64 thunderX platform. Patch builds for
x86/i386/arm/armv8/thunderX. Tested with testpmd application.

Refer v3 [1] cover letter for dependancy description:
Step to enable vfio-noiommu mode:
- modprobe vfio-pci
echo 1 > /sys/module/vfio/parameters/enable_unsafe_*
- then bind 
./tools/dpdk_nic_bind.py -b vfio-pci :00:03.0

- Testpmd application to try out for:
./app/testpmd -c 0x3 -n 4 -- -i --portmask=0x0  --nb-cores=1 
--port-topology=chained

On host side ping to tapX interface and observe pkt_cnt on guest side.

For patch history from v1-->v3 pl. refer v3 cover letter [1]

v3--> v4:
- Incorporated v3 review comments, Thanks to Stephen, Yuan, Bruce for comment!
- Tested for Huawei patch series titled "[PATCH v2 0/4] fix the issue that DPDK
  takes over virtio device blindly". 
- Patch no 11 and 13 are testonly patches used for this patch series [Anatoly/
  Yuan]

Major change in series:
- Introducing vfio interface parse api in virtio pmd driver
- Added vfio device specific private header in struct virtio_hw{}
- Dummy in/oub x86-style api, just to pass build error for non-x86 arch for vfio
  mode.
- VIRTIO_REG_RD/WR API(s) are now able to do rd/wr for both interfaces i.e. for
  vfio and igb_uio/ioport bar. Tested for both mode for x86 and also only for
  vfio for arm64 (non-x86) archs.

So to try-out complete patc-set w/o cut-n-paste pain clone this [2]

Thanks!.

[1] http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/31117
[2] https://github.com/sshukla82/dpdk.git branch master-virtio-vfio-v4

Anatoly Burakov (1):
  vfio: Support for no-IOMMU mode

Santosh Shukla (12):
  virtio: Introduce config RTE_VIRTIO_INC_VECTOR
  config: i686: set RTE_VIRTIO_INC_VECTOR=n
  linuxapp: eal: arm: Always return 0 for rte_eal_iopl_init()
  linuxapp/vfio: ignore mapping for ioport region
  virtio_pci.h: build fix for sys/io.h for non-x86 arch
  eal: pci: vfio: add rd/wr func for pci bar space
  virtio: vfio: add api support to rd/wr ioport bar
  virtio: pci: extend virtio pci rw api for vfio interface
  virtio: ethdev: check for vfio interface
  virtio: pci: add dummy func definition for in/outb for non-x86 arch
  config: armv7/v8: Enable RTE_LIBRTE_VIRTIO_PMD
  virtio: enable vfio in pmd driver

Yuanhan Liu (1):
  eal: pci: export pci_[un]map_device

 config/common_linuxapp  |1 +
 config/defconfig_arm-armv7a-linuxapp-gcc|4 +-
 config/defconfig_arm64-armv8a-linuxapp-gcc  |4 +-
 config/defconfig_i686-native-linuxapp-gcc   |1 +
 config/defconfig_i686-native-linuxapp-icc   |1 +
 drivers/net/virtio/Makefile |2 +-
 drivers/net/virtio/virtio_ethdev.c  |  124 ++-
 drivers/net/virtio/virtio_pci.h |  128 +--
 drivers/net/virtio/virtio_rxtx.c|7 +
 drivers/net/virtio/virtio_vfio_rw.h |  107 +
 lib/librte_eal/bsdapp/eal/eal_pci.c |4 +-
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |7 +
 lib/librte_eal/common/eal_common_pci.c  |4 +-
 lib/librte_eal/common/eal_private.h |   18 ---
 lib/librte_eal/common/include/rte_pci.h |   65 
 lib/librte_eal/linuxapp/eal/Makefile|1 +
 lib/librte_eal/linuxapp/eal/eal.c   |2 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |   41 -
 lib/librte_eal/linuxapp/eal/eal_pci_init.h  |   28 
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c  |  191 ---
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c  |   84 ++
 lib/librte_eal/linuxapp/eal/eal_vfio.h  |5 +
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |7 +
 23 files changed, 740 insertions(+), 96 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_vfio_rw.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c

-- 
1.7.9.5



[dpdk-dev] [PATCH v4 01/14] virtio: Introduce config RTE_VIRTIO_INC_VECTOR

2016-01-14 Thread Santosh Shukla
virtio_recv_pkts_vec and other virtio vector friend apis are written for sse/avx
instructions. For arm64 in particular, virtio vector implementation does not
exist(todo).

So virtio pmd driver wont build for targets like i686, arm64.  By making
RTE_VIRTIO_INC_VECTOR=n, Driver can build for non-sse/avx targets and will work
in non-vectored virtio mode.

Signed-off-by: Santosh Shukla 
---
 config/common_linuxapp   |1 +
 drivers/net/virtio/Makefile  |2 +-
 drivers/net/virtio/virtio_rxtx.c |7 +++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/config/common_linuxapp b/config/common_linuxapp
index 74bc515..8677697 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -274,6 +274,7 @@ CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n
 CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n
 CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DRIVER=n
 CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DUMP=n
+CONFIG_RTE_VIRTIO_INC_VECTOR=y

 #
 # Compile burst-oriented VMXNET3 PMD driver
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 43835ba..25a842d 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c
-SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c
+SRCS-$(CONFIG_RTE_VIRTIO_INC_VECTOR) += virtio_rxtx_simple.c

 # this lib depends upon:
 DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 74b39ef..23be1ff 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -438,7 +438,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev,

dev->data->rx_queues[queue_idx] = vq;

+#ifdef RTE_VIRTIO_INC_VECTOR
virtio_rxq_vec_setup(vq);
+#endif

return 0;
 }
@@ -464,7 +466,10 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
const struct rte_eth_txconf *tx_conf)
 {
uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
+
+#ifdef RTE_VIRTIO_INC_VECTOR
struct virtio_hw *hw = dev->data->dev_private;
+#endif
struct virtqueue *vq;
uint16_t tx_free_thresh;
int ret;
@@ -477,6 +482,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
return -EINVAL;
}

+#ifdef RTE_VIRTIO_INC_VECTOR
/* Use simple rx/tx func if single segment and no offloads */
if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS &&
 !vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) {
@@ -485,6 +491,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
dev->rx_pkt_burst = virtio_recv_pkts_vec;
use_simple_rxtx = 1;
}
+#endif

ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx,
nb_desc, socket_id, &vq);
-- 
1.7.9.5



[dpdk-dev] [PATCH v4 02/14] config: i686: set RTE_VIRTIO_INC_VECTOR=n

2016-01-14 Thread Santosh Shukla
i686 target config example:
config/defconfig_i686-native-linuxapp-gcc says "Vectorized PMD is not supported
on 32-bit".

So setting RTE_VIRTIO_INC_VECTOR to 'n'.

Signed-off-by: Santosh Shukla 
---
 config/defconfig_i686-native-linuxapp-gcc |1 +
 config/defconfig_i686-native-linuxapp-icc |1 +
 2 files changed, 2 insertions(+)

diff --git a/config/defconfig_i686-native-linuxapp-gcc 
b/config/defconfig_i686-native-linuxapp-gcc
index a90de9b..a4b1c49 100644
--- a/config/defconfig_i686-native-linuxapp-gcc
+++ b/config/defconfig_i686-native-linuxapp-gcc
@@ -49,3 +49,4 @@ CONFIG_RTE_LIBRTE_KNI=n
 # Vectorized PMD is not supported on 32-bit
 #
 CONFIG_RTE_IXGBE_INC_VECTOR=n
+CONFIG_RTE_VIRTIO_INC_VECTOR=n
diff --git a/config/defconfig_i686-native-linuxapp-icc 
b/config/defconfig_i686-native-linuxapp-icc
index c021321..f8eb6ad 100644
--- a/config/defconfig_i686-native-linuxapp-icc
+++ b/config/defconfig_i686-native-linuxapp-icc
@@ -49,3 +49,4 @@ CONFIG_RTE_LIBRTE_KNI=n
 # Vectorized PMD is not supported on 32-bit
 #
 CONFIG_RTE_IXGBE_INC_VECTOR=n
+CONFIG_RTE_VIRTIO_INC_VECTOR=n
-- 
1.7.9.5



[dpdk-dev] [PATCH v4 03/14] linuxapp: eal: arm: Always return 0 for rte_eal_iopl_init()

2016-01-14 Thread Santosh Shukla
iopl() syscall not supported in linux-arm/arm64 so always return 0 value.

Signed-off-by: Santosh Shukla 
Suggested-by: Stephen Hemminger 
Acked-by: Jan Viktorin 
---
v3->v4:
- moved #ifdef to elseif -> suggested by Stephen.

lib/librte_eal/linuxapp/eal/eal.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal.c 
b/lib/librte_eal/linuxapp/eal/eal.c
index 635ec36..a2a3485 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -715,6 +715,8 @@ rte_eal_iopl_init(void)
if (iopl(3) != 0)
return -1;
return 0;
+#elif defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
+   return 0; /* iopl syscall not supported for ARM/ARM64 */
 #else
return -1;
 #endif
-- 
1.7.9.5



[dpdk-dev] [PATCH v4 04/14] linuxapp/vfio: ignore mapping for ioport region

2016-01-14 Thread Santosh Shukla
vfio_pci_mmap() try to map all pci bars. ioport region are not mapped in
vfio/kernel so ignore mmaping for ioport.

Signed-off-by: Santosh Shukla 
---
v3->v4: per review comment from Stephen and Yuan.
- removed ioport_bar var declaration with in func to top of func
- rearrange log message to fit with in 80 line

lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |   20 
 1 file changed, 20 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 74f91ba..abde779 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -573,6 +573,7 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
struct pci_map *maps;
uint32_t msix_table_offset = 0;
uint32_t msix_table_size = 0;
+   uint32_t ioport_bar;

dev->intr_handle.fd = -1;
dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
@@ -760,6 +761,25 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
return -1;
}

+   /* chk for io port region */
+   ret = pread64(vfio_dev_fd, &ioport_bar, sizeof(ioport_bar),
+ VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX)
+ + PCI_BASE_ADDRESS_0 + i*4);
+
+   if (ret != sizeof(ioport_bar)) {
+   RTE_LOG(ERR, EAL,
+   "Cannot read command (%x) from config space!\n",
+   PCI_BASE_ADDRESS_0 + i*4);
+   return -1;
+   }
+
+   if (ioport_bar & PCI_BASE_ADDRESS_SPACE_IO) {
+   RTE_LOG(INFO, EAL,
+   "Ignore mapping IO port bar(%d) addr: %x\n",
+i, ioport_bar);
+   continue;
+   }
+
/* skip non-mmapable BARs */
if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) == 0)
continue;
-- 
1.7.9.5



[dpdk-dev] [PATCH v4 05/14] virtio_pci.h: build fix for sys/io.h for non-x86 arch

2016-01-14 Thread Santosh Shukla
make sure sys/io.h used only for x86 archs. This fixes build error
arm64/arm case.

Signed-off-by: Santosh Shukla 
---
 drivers/net/virtio/virtio_pci.h |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 47f722a..8b5b031 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -40,8 +40,10 @@
 #include 
 #include 
 #else
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
 #include 
 #endif
+#endif

 #include 

-- 
1.7.9.5



[dpdk-dev] [PATCH v4 06/14] eal: pci: vfio: add rd/wr func for pci bar space

2016-01-14 Thread Santosh Shukla
Introducing below api for pci bar space rd/wr. Currently used for
pci iobar rd/wr.

Api's are:
- rte_eal_pci_read_bar
- rte_eal_pci_write_bar

virtio when used for vfio-mode then virtio driver will use these api
to do rd/wr operation on ioport pci bar.

Signed-off-by: Santosh Shukla 
---
v3->v4:
- Using RTE_SET_USED(_var_) for unused variable for !VFIO_PRESENT case.
  As per v3 review comment from Bruce. 

lib/librte_eal/common/include/rte_pci.h|   38 
 lib/librte_eal/linuxapp/eal/eal_pci.c  |   37 +++
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |6 +
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |   28 
 4 files changed, 109 insertions(+)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 334c12e..53437cc 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -471,6 +471,44 @@ int rte_eal_pci_read_config(const struct rte_pci_device 
*device,
void *buf, size_t len, off_t offset);

 /**
+ * Read PCI bar space.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ * @param buf
+ *   A data buffer where the bytes should be read into
+ * @param len
+ *   The length of the data buffer.
+ * @param offset
+ *   The offset into PCI bar space
+ * @param bar_idx
+ *   The pci bar index (valid range is 0..5)
+ */
+int rte_eal_pci_read_bar(const struct rte_pci_device *device,
+void *buf, size_t len, off_t offset, int bar_idx);
+
+/**
+ * Write PCI bar space.
+ *
+ * @param device
+ *   A pointer to a rte_pci_device structure describing the device
+ *   to use
+ * @param buf
+ *   A data buffer containing the bytes should be written
+ * @param len
+ *   The length of the data buffer.
+ * @param offset
+ *   The offset into PCI config space
+ * @param bar_idx
+ *   The pci bar index (valid range is 0..5)
+*/
+int rte_eal_pci_write_bar(const struct rte_pci_device *device,
+ const void *buf, size_t len, off_t offset,
+ int bar_idx);
+
+
+/**
  * Write PCI config space.
  *
  * @param device
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index bc5b5be..8c1a49d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -621,6 +621,43 @@ int rte_eal_pci_write_config(const struct rte_pci_device 
*device,
}
 }

+int rte_eal_pci_read_bar(const struct rte_pci_device *device,
+void *buf, size_t len, off_t offset,
+int bar_idx)
+
+{
+#ifdef VFIO_PRESENT
+   const struct rte_intr_handle *intr_handle = &device->intr_handle;
+   return pci_vfio_read_bar(intr_handle, buf, len, offset, bar_idx);
+#else
+   /* UIO's not applicable */
+   RTE_SET_USED(device);
+   RTE_SET_USED(buf);
+   RTE_SET_USED(len);
+   RTE_SET_USED(offset);
+   RTE_SET_USED(bar_idx);
+   return 0;
+#endif
+}
+
+int rte_eal_pci_write_bar(const struct rte_pci_device *device,
+ const void *buf, size_t len, off_t offset,
+ int bar_idx)
+{
+#ifdef VFIO_PRESENT
+   const struct rte_intr_handle *intr_handle = &device->intr_handle;
+   return pci_vfio_write_bar(intr_handle, buf, len, offset, bar_idx);
+#else
+   /* UIO's not applicable */
+   RTE_SET_USED(device);
+   RTE_SET_USED(buf);
+   RTE_SET_USED(len);
+   RTE_SET_USED(offset);
+   RTE_SET_USED(bar_idx);
+   return 0;
+#endif
+}
+
 /* Init the PCI EAL subsystem */
 int
 rte_eal_pci_init(void)
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index a17c708..3bc592b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -68,6 +68,12 @@ int pci_vfio_read_config(const struct rte_intr_handle 
*intr_handle,
 int pci_vfio_write_config(const struct rte_intr_handle *intr_handle,
  const void *buf, size_t len, off_t offs);

+int pci_vfio_read_bar(const struct rte_intr_handle *intr_handle,
+ void *buf, size_t len, off_t offs, int bar_idx);
+
+int pci_vfio_write_bar(const struct rte_intr_handle *intr_handle,
+  const void *buf, size_t len, off_t offs, int bar_idx);
+
 /* map VFIO resource prototype */
 int pci_vfio_map_resource(struct rte_pci_device *dev);
 int pci_vfio_get_group_fd(int iommu_group_fd);
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index abde779..df407ef 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -93,6 +93,34 @@ pci_vfio_write_config(const struct rte_intr_handle 
*intr_handle,
   VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) 

[dpdk-dev] [PATCH v4 07/14] virtio: vfio: add api support to rd/wr ioport bar

2016-01-14 Thread Santosh Shukla
For vfio case - Use pread/pwrite api to access virtio
ioport space.

Signed-off-by: Santosh Shukla 
Signed-off-by: Rizwan Ansari 
Signed-off-by: Rakesh Krishnamurthy 
---
v3->v4:
- Corrected debug error message for oub_ class of apis
- renamed file from virtio_vfio.h to virtio_vfio_rw.h
- Removed #ifdef , #else clutter so that now no #else condition to handle for!
  this file will work for all the arch which is using vfio interface.

drivers/net/virtio/virtio_vfio_rw.h |  107 +++
 1 file changed, 107 insertions(+)
 create mode 100644 drivers/net/virtio/virtio_vfio_rw.h

diff --git a/drivers/net/virtio/virtio_vfio_rw.h 
b/drivers/net/virtio/virtio_vfio_rw.h
new file mode 100644
index 000..80a67f4
--- /dev/null
+++ b/drivers/net/virtio/virtio_vfio_rw.h
@@ -0,0 +1,107 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Cavium Networks. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *   * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *   * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ *   * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ *THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+#ifndef _VIRTIO_VFIO_RW_H_
+#define _VIRTIO_VFIO_RW_H_
+
+#if defined(RTE_EAL_VFIO) && defined(RTE_LIBRTE_EAL_LINUXAPP)
+
+#include 
+#include 
+#include "virtio_logs.h"
+
+/* vfio rd/rw virtio apis */
+static inline void ioport_inb(const struct rte_pci_device *pci_dev,
+ uint8_t reg, uint8_t *val)
+{
+   if (rte_eal_pci_read_bar(pci_dev, (uint8_t *)val, sizeof(uint8_t), reg,
+0) <= 0) {
+   PMD_DRV_LOG(ERR, "Can't read from PCI bar space");
+   return;
+   }
+}
+
+static inline void ioport_inw(const struct rte_pci_device *pci_dev,
+ uint16_t reg, uint16_t *val)
+{
+   if (rte_eal_pci_read_bar(pci_dev, (uint16_t *)val, sizeof(uint16_t),
+reg, 0) <= 0) {
+   PMD_DRV_LOG(ERR, "Can't read from PCI bar space");
+   return;
+   }
+}
+
+static inline void ioport_inl(const struct rte_pci_device *pci_dev,
+ uint32_t reg, uint32_t *val)
+{
+   if (rte_eal_pci_read_bar(pci_dev, (uint32_t *)val, sizeof(uint32_t),
+reg, 0) <= 0) {
+   PMD_DRV_LOG(ERR, "Can't read from PCI bar space");
+   return;
+   }
+}
+
+static inline void ioport_outb_p(const struct rte_pci_device *pci_dev,
+uint8_t reg, uint8_t val)
+{
+   if (rte_eal_pci_write_bar(pci_dev, (uint8_t *)&val, sizeof(uint8_t),
+ reg, 0) <= 0) {
+   PMD_DRV_LOG(ERR, "Can't write to PCI bar space");
+   return;
+   }
+}
+
+
+static inline void ioport_outw_p(const struct rte_pci_device *pci_dev,
+uint16_t reg, uint16_t val)
+{
+   if (rte_eal_pci_write_bar(pci_dev, (uint16_t *)&val, sizeof(uint16_t),
+ reg, 0) <= 0) {
+   PMD_DRV_LOG(ERR, "Can't write to PCI bar space");
+   return;
+   }
+}
+
+
+static inline void ioport_outl_p(const struct rte_pci_device *pci_dev,
+uint32_t reg, uint32_t val)
+{
+   if (rte_eal_pci_write_bar(pci_dev, (uint32_t *)&val, sizeof(uint32_t),
+ reg, 0) <= 0) {
+   PMD_DRV_LOG(ERR, "Can't write to PCI bar space");
+   return;
+   }
+}
+
+#endif /* RTE_EAL_VFIO && RTE_XX_EAL_LINUXAPP */
+#endif /* _VIRTIO_VFI

[dpdk-dev] [PATCH v4 08/14] virtio: pci: extend virtio pci rw api for vfio interface

2016-01-14 Thread Santosh Shukla
So far virtio handle rw access for uio / ioport interface, This patch to extend
the support for vfio interface. For that introducing private struct
virtio_vfio_dev{
- is_vfio
- pci_dev
};
Signed-off-by: Santosh Shukla 
---
v3->v4:
- Removed #indef RTE_EAL_VFIO and made it arch agnostic such now virtio_pci
  rd/wr api to handle both vfio and ig_uio/ioport interfaces, depending upon
  is_vfio flags set or unset.
- Tested for x86 for igb_uio and vfio interface, also  tested for arm64 for vfio
  interface.

drivers/net/virtio/virtio_pci.h |   84 ---
 1 file changed, 70 insertions(+), 14 deletions(-)

diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 8b5b031..8526c07 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -46,6 +46,8 @@
 #endif

 #include 
+#include 
+#include "virtio_vfio_rw.h"

 struct virtqueue;

@@ -165,6 +167,14 @@ struct virtqueue;
  */
 #define VIRTIO_MAX_VIRTQUEUES 8

+/* For vfio only */
+struct virtio_vfio_dev {
+   boolis_vfio;/* True: vfio i/f,
+* False: not a vfio i/f
+*/
+   struct rte_pci_device *pci_dev; /* vfio dev */
+};
+
 struct virtio_hw {
struct virtqueue *cvq;
uint32_tio_base;
@@ -176,6 +186,7 @@ struct virtio_hw {
uint8_t use_msix;
uint8_t started;
uint8_t mac_addr[ETHER_ADDR_LEN];
+   struct virtio_vfio_dev dev;
 };

 /*
@@ -231,20 +242,65 @@ outl_p(unsigned int data, unsigned int port)
 #define VIRTIO_PCI_REG_ADDR(hw, reg) \
(unsigned short)((hw)->io_base + (reg))

-#define VIRTIO_READ_REG_1(hw, reg) \
-   inb((VIRTIO_PCI_REG_ADDR((hw), (reg
-#define VIRTIO_WRITE_REG_1(hw, reg, value) \
-   outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
-
-#define VIRTIO_READ_REG_2(hw, reg) \
-   inw((VIRTIO_PCI_REG_ADDR((hw), (reg
-#define VIRTIO_WRITE_REG_2(hw, reg, value) \
-   outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
-
-#define VIRTIO_READ_REG_4(hw, reg) \
-   inl((VIRTIO_PCI_REG_ADDR((hw), (reg
-#define VIRTIO_WRITE_REG_4(hw, reg, value) \
-   outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg
+#define VIRTIO_READ_REG_1(hw, reg) \
+({ \
+   uint8_t ret;\
+   struct virtio_vfio_dev *vdev;   \
+   (vdev) = (&(hw)->dev);  \
+   (((vdev)->is_vfio) ?\
+   (ioport_inb(((vdev)->pci_dev), reg, &ret)) :\
+   ((ret) = (inb((VIRTIO_PCI_REG_ADDR((hw), (reg)));   \
+   ret;\
+})
+
+#define VIRTIO_WRITE_REG_1(hw, reg, value) \
+({ \
+   struct virtio_vfio_dev *vdev;   \
+   (vdev) = (&(hw)->dev);  \
+   (((vdev)->is_vfio) ?\
+   (ioport_outb_p(((vdev)->pci_dev), reg, (uint8_t)(value))) : \
+   (outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg)); \
+})
+
+#define VIRTIO_READ_REG_2(hw, reg) \
+({ \
+   uint16_t ret;   \
+   struct virtio_vfio_dev *vdev;   \
+   (vdev) = (&(hw)->dev);  \
+   (((vdev)->is_vfio) ?\
+   (ioport_inw(((vdev)->pci_dev), reg, &ret)) :\
+   ((ret) = (inw((VIRTIO_PCI_REG_ADDR((hw), (reg)));   \
+   ret;\
+})
+
+#define VIRTIO_WRITE_REG_2(hw, reg, value) \
+({ \
+   struct virtio_vfio_dev *vdev;   \
+   (vdev) = (&(hw)->dev);  \
+   (((vdev)->is_vfio) ?\
+   (ioport_outw_p(((vdev)->pci_dev), reg, (uint16_t)(value))) :\
+   (outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg)); 
\
+})
+
+#define VIRTIO_READ_REG_4(hw, reg) \
+({ \
+   uint32_t ret;   \
+   struct vi

[dpdk-dev] [PATCH v4 09/14] virtio: ethdev: check for vfio interface

2016-01-14 Thread Santosh Shukla
Introducing api to check interface type is vfio or not, if interface is vfio
then update struct virtio_vfio_dev {}.

Those two apis are:
- virtio_chk_for_vfio
- virtio_hw_init_by_vfio

Signed-off-by: Santosh Shukla 
---
v3->v4:
- Removed RTE_PCI_DRV_NEED_MAPPING drv flag (as per Review comment from Stephen
  and Suggested by Yuan)
- Introducing vfio interface parsing api which will set/unset is_vfio flag at
  runtime.

drivers/net/virtio/virtio_ethdev.c |  112 +++-
 1 file changed, 110 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index d928339..8f2260f 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1202,6 +1202,105 @@ static int virtio_resource_init(struct rte_pci_device 
*pci_dev)
return virtio_resource_init_by_ioports(pci_dev);
 }

+static int virtio_chk_for_vfio(struct rte_pci_device *pci_dev)
+{
+   /*
+* 1. check whether vfio-noiommu mode is enabled
+* 2. verify pci device attached to vfio-noiommu driver
+* root at arm64:/sys/bus/pci/drivers/vfio-pci/:00:01.0/iommu_group#
+* > cat name
+* > vfio-noiommu
+*/
+
+   /* 1. Chk for vfio: noiommu mode set or not in kernel driver */
+   struct rte_pci_addr *loc;
+   FILE *fp;
+   const char *path = 
"/sys/module/vfio/parameters/enable_unsafe_noiommu_mode";
+   char filename[PATH_MAX] = {0};
+   char buf[PATH_MAX] = {0};
+
+   fp = fopen(path, "r");
+   if (fp == NULL) {
+   PMD_INIT_LOG(ERR, "can't open %s\n", path);
+   return -1;
+   }
+
+   if (fread(buf, sizeof(char), 1, fp) != 1) {
+   PMD_INIT_LOG(ERR, "can't read from file %s\n", path);
+   fclose(fp);
+   return -1;
+   }
+
+   if (strncmp(buf, "Y", 1) != 0) {
+   PMD_INIT_LOG(ERR, "[%s]: vfio: noiommu mode not set\n", path);
+   fclose(fp);
+   return -1;
+   }
+
+   fclose(fp);
+
+   /* 2. Verify pci device attached to vfio-noiommu driver */
+
+   /* 2.1 chk whether attached driver is vfio-noiommu or not */
+   loc = &pci_dev->addr;
+   snprintf(filename, sizeof(filename),
+SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/iommu_group/name",
+loc->domain, loc->bus, loc->devid, loc->function);
+
+   /* check for vfio-noiommu */
+   fp = fopen(filename, "r");
+   if (fp == NULL) {
+   PMD_INIT_LOG(ERR, "can't open %s\n", filename);
+   return -1;
+   }
+
+   if (fread(buf, sizeof(char), sizeof("vfio-noiommu"), fp) !=
+ sizeof("vfio-noiommu")) {
+   PMD_INIT_LOG(ERR, "can't read from file %s\n", filename);
+   fclose(fp);
+   return -1;
+   }
+
+   if (strncmp(buf, "vfio-noiommu", strlen("vfio-noiommu")) != 0) {
+   PMD_INIT_LOG(ERR, "not a vfio-noiommu driver\n");
+   fclose(fp);
+   return -1;
+   }
+
+   fclose(fp);
+
+   /* todo: vfio interrupt handling */
+   return 0;
+}
+
+/* Init virtio by vfio-way */
+static int virtio_hw_init_by_vfio(struct virtio_hw *hw,
+ struct rte_pci_device *pci_dev)
+{
+   struct virtio_vfio_dev *vdev;
+
+   vdev = &hw->dev;
+   if (virtio_chk_for_vfio(pci_dev) < 0) {
+   vdev->is_vfio = false;
+   vdev->pci_dev = NULL;
+   return -1;
+   }
+
+   /* .. So attached interface is vfio */
+   vdev->is_vfio = true;
+   vdev->pci_dev = pci_dev;
+
+   /* For debug use only */
+   const struct rte_intr_handle *intr_handle;
+   RTE_SET_USED(intr_handle); /* to keep compilar happy */
+   intr_handle = &pci_dev->intr_handle;
+   PMD_INIT_LOG(DEBUG, "vdev->pci_dev %p intr_handle %p vfio_dev_fd %d\n",
+vdev->pci_dev, intr_handle,
+intr_handle->vfio_dev_fd);
+
+   return 0;
+}
+
 #else
 static int
 virtio_has_msix(const struct rte_pci_addr *loc __rte_unused)
@@ -1215,6 +1314,13 @@ static int virtio_resource_init(struct rte_pci_device 
*pci_dev __rte_unused)
/* no setup required */
return 0;
 }
+
+static int virtio_hw_init_by_vfio(struct virtio_hw *hw __rte_unused,
+ struct rte_pci_device *pci_dev __rte_unused)
+{
+   /* NA */
+   return 0;
+}
 #endif

 /*
@@ -1287,8 +1393,10 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev)

pci_dev = eth_dev->pci_dev;

-   if (virtio_resource_init(pci_dev) < 0)
-   return -1;
+   if (virtio_hw_init_by_vfio(hw, pci_dev) < 0) {
+   if (virtio_resource_init(pci_dev) < 0)
+   return -1;
+   }

hw->use_msix = virtio_has_msix(&pci_dev->addr);
hw->io_base = (uint32_t)(uintptr_t)pci_dev-

[dpdk-dev] [PATCH v4 10/14] virtio: pci: add dummy func definition for in/outb for non-x86 arch

2016-01-14 Thread Santosh Shukla
For non-x86 arch, Compiler will throw build error for in/out apis. Including
dummy api function so to pass build.

Note that: For virtio to work for non-x86 arch - RTE_EAL_VFIO is the only
supported method. RTE_EAL_IGB_UIO is not supported for non-x86 arch.

So, Virtio support for arch and supported interface by that arch:

ARCH   IGB_UIO  VFIO
x86 Y   Y
ARM64   N/A Y
PPC_64  N/A Y   (Not tested but likely should work, as vfio is
arch independent)

Signed-off-by: Santosh Shukla 
---
v4:
- dummy inb/outb function useful for non-x86 archs, Intent to get-rid of build
  error. 

drivers/net/virtio/virtio_pci.h |   42 +++
 1 file changed, 42 insertions(+)

diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index 8526c07..600260a 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -239,6 +239,48 @@ outl_p(unsigned int data, unsigned int port)
 }
 #endif

+#if !defined(RTE_ARCH_X86_64) && !defined(RTE_ARCH_I686) && \
+   defined(RTE_EXEC_ENV_LINUXAPP)
+static inline uint8_t inb(unsigned long addr __rte_unused)
+{
+   PMD_INIT_LOG(ERR, "inb() not supported for this RTE_ARCH\n");
+   return 0;
+}
+
+static inline uint16_t inw(unsigned long addr __rte_unused)
+{
+   PMD_INIT_LOG(ERR, "inw() not supported for this RTE_ARCH\n");
+   return 0;
+}
+
+static inline uint32_t inl(unsigned long addr __rte_unused)
+{
+   PMD_INIT_LOG(ERR, "in() not supported for this RTE_ARCH\n");
+   return 0;
+}
+
+static inline void
+outb_p(unsigned char data __rte_unused, unsigned int port __rte_unused)
+{
+   PMD_INIT_LOG(ERR, "outb_p() not supported for this RTE_ARCH\n");
+   return;
+}
+
+static inline void
+outw_p(unsigned short data __rte_unused, unsigned int port __rte_unused)
+{
+   PMD_INIT_LOG(ERR, "outw_p() not supported for this RTE_ARCH\n");
+   return;
+}
+
+static inline void
+outl_p(unsigned int data __rte_unused, unsigned int port __rte_unused)
+{
+   PMD_INIT_LOG(ERR, "outl_p() not supported for this RTE_ARCH\n");
+   return;
+}
+#endif
+
 #define VIRTIO_PCI_REG_ADDR(hw, reg) \
(unsigned short)((hw)->io_base + (reg))

-- 
1.7.9.5



[dpdk-dev] [PATCH v4 11/14] config: armv7/v8: Enable RTE_LIBRTE_VIRTIO_PMD

2016-01-14 Thread Santosh Shukla
Enable RTE_LIBRTE_VIRTIO_PMD for armv7/v8 and setting RTE_VIRTIO_INC_VEC=n.
Builds successfully for armv7/v8.

Signed-off-by: Santosh Shukla 
---
v2->v4:
- Removed explict setting of VIRTIO_PMD in configs (per review comment from
  Jiangbo)

config/defconfig_arm-armv7a-linuxapp-gcc   |4 +++-
 config/defconfig_arm64-armv8a-linuxapp-gcc |4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc 
b/config/defconfig_arm-armv7a-linuxapp-gcc
index cbebd64..9f852ce 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -43,6 +43,9 @@ CONFIG_RTE_FORCE_INTRINSICS=y
 CONFIG_RTE_TOOLCHAIN="gcc"
 CONFIG_RTE_TOOLCHAIN_GCC=y

+# Disable VIRTIO VECTOR support
+CONFIG_RTE_VIRTIO_INC_VECTOR=n
+
 # ARM doesn't have support for vmware TSC map
 CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n

@@ -70,7 +73,6 @@ CONFIG_RTE_LIBRTE_I40E_PMD=n
 CONFIG_RTE_LIBRTE_IXGBE_PMD=n
 CONFIG_RTE_LIBRTE_MLX4_PMD=n
 CONFIG_RTE_LIBRTE_MPIPE_PMD=n
-CONFIG_RTE_LIBRTE_VIRTIO_PMD=n
 CONFIG_RTE_LIBRTE_VMXNET3_PMD=n
 CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
 CONFIG_RTE_LIBRTE_PMD_BNX2X=n
diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc 
b/config/defconfig_arm64-armv8a-linuxapp-gcc
index 504f3ed..1a638b3 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -45,8 +45,10 @@ CONFIG_RTE_TOOLCHAIN_GCC=y

 CONFIG_RTE_CACHE_LINE_SIZE=64

+# Disable VIRTIO VECTOR support
+CONFIG_RTE_VIRTIO_INC_VECTOR=n
+
 CONFIG_RTE_IXGBE_INC_VECTOR=n
-CONFIG_RTE_LIBRTE_VIRTIO_PMD=n
 CONFIG_RTE_LIBRTE_IVSHMEM=n
 CONFIG_RTE_LIBRTE_FM10K_PMD=n
 CONFIG_RTE_LIBRTE_I40E_PMD=n
-- 
1.7.9.5



[dpdk-dev] [PATCH v4 12/14] eal: pci: export pci_[un]map_device

2016-01-14 Thread Santosh Shukla
From: Yuanhan Liu 

Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will
invoke pci_map_device internally for us. From that point view, there
is no need to export pci_map_device.

However, for virtio pmd driver, which is designed to work without
binding UIO (or something similar first), pci_map_device() will fail,
which ends up with virtio pmd driver being skipped. Therefore, we can
not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver.

Therefore, this patch exports pci_map_device, and let virtio pmd
call it when necessary.

Cc: David Marchand 
Signed-off-by: Yuanhan Liu 
Tested-by: Santosh Shukla 
---
- Pulled Yuan v3 patch, just for testing and other user to try out complete set
  and test it on HW i.e. x86 and non-x86 both. It works for VFIO mode
  seemlessly.

lib/librte_eal/bsdapp/eal/eal_pci.c |4 ++--
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |7 ++
 lib/librte_eal/common/eal_common_pci.c  |4 ++--
 lib/librte_eal/common/eal_private.h |   18 ---
 lib/librte_eal/common/include/rte_pci.h |   27 +++
 lib/librte_eal/linuxapp/eal/eal_pci.c   |4 ++--
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |7 ++
 7 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c 
b/lib/librte_eal/bsdapp/eal/eal_pci.c
index 6c21fbd..95c32c1 100644
--- a/lib/librte_eal/bsdapp/eal/eal_pci.c
+++ b/lib/librte_eal/bsdapp/eal/eal_pci.c
@@ -93,7 +93,7 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev 
__rte_unused)

 /* Map pci device */
 int
-pci_map_device(struct rte_pci_device *dev)
+rte_eal_pci_map_device(struct rte_pci_device *dev)
 {
int ret = -1;

@@ -115,7 +115,7 @@ pci_map_device(struct rte_pci_device *dev)

 /* Unmap pci device */
 void
-pci_unmap_device(struct rte_pci_device *dev)
+rte_eal_pci_unmap_device(struct rte_pci_device *dev)
 {
/* try unmapping the NIC resources */
switch (dev->kdrv) {
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map 
b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 9d7adf1..1b28170 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -135,3 +135,10 @@ DPDK_2.2 {
rte_xen_dom0_supported;

 } DPDK_2.1;
+
+DPDK_2.3 {
+   global:
+
+   rte_eal_pci_map_device;
+   rte_eal_pci_unmap_device;
+} DPDK_2.2;
diff --git a/lib/librte_eal/common/eal_common_pci.c 
b/lib/librte_eal/common/eal_common_pci.c
index dcfe947..96d5113 100644
--- a/lib/librte_eal/common/eal_common_pci.c
+++ b/lib/librte_eal/common/eal_common_pci.c
@@ -188,7 +188,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
pci_config_space_set(dev);
 #endif
/* map resources for devices that use igb_uio */
-   ret = pci_map_device(dev);
+   ret = rte_eal_pci_map_device(dev);
if (ret != 0)
return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
@@ -254,7 +254,7 @@ rte_eal_pci_detach_dev(struct rte_pci_driver *dr,

if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING)
/* unmap resources for devices that use igb_uio */
-   pci_unmap_device(dev);
+   rte_eal_pci_unmap_device(dev);

return 0;
}
diff --git a/lib/librte_eal/common/eal_private.h 
b/lib/librte_eal/common/eal_private.h
index 072e672..2342fa1 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -165,24 +165,6 @@ struct rte_pci_device;
 int pci_unbind_kernel_driver(struct rte_pci_device *dev);

 /**
- * Map this device
- *
- * This function is private to EAL.
- *
- * @return
- *   0 on success, negative on error and positive if no driver
- *   is found for the device.
- */
-int pci_map_device(struct rte_pci_device *dev);
-
-/**
- * Unmap this device
- *
- * This function is private to EAL.
- */
-void pci_unmap_device(struct rte_pci_device *dev);
-
-/**
  * Map the PCI resource of a PCI device in virtual memory
  *
  * This function is private to EAL.
diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 53437cc..0c667ff 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -523,6 +523,33 @@ int rte_eal_pci_write_bar(const struct rte_pci_device 
*device,
  */
 int rte_eal_pci_write_config(const struct rte_pci_device *device,
 const void *buf, size_t len, off_t offset);
+/**
+ * Map the PCI device resources in user space virtual memory address
+ *
+ * Note that driver should not call this function when flag
+ * RTE_PCI_DRV_NEED_MAPPING is set, as EAL will do that for
+ * you when it's on.
+ *
+ * @param dev
+ *   A pointer to a rte_pc

[dpdk-dev] [PATCH v4 13/14] virtio: enable vfio in pmd driver

2016-01-14 Thread Santosh Shukla
Using mapping api i.e. rte_eal_pci_map_device() to create vfio container, group
id and get the vfio-dev-fd for virtio-net-pci interface. Later vfio_dev_fd used
for virtio device rd/wr operation.

Signed-off-by: Santosh Shukla 
---
v3->v4:
- Per stephens comment, removed static driver flag RTE_PCI_XXX_XX_NEED_MAPPING ,
  now vfio virtio pmd driver intialized vifio interface at runtime.

drivers/net/virtio/virtio_ethdev.c |   12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/virtio/virtio_ethdev.c 
b/drivers/net/virtio/virtio_ethdev.c
index 8f2260f..ce03a24 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -497,6 +497,8 @@ virtio_dev_close(struct rte_eth_dev *dev)
hw->started = 0;
virtio_dev_free_mbufs(dev);
virtio_free_queues(dev);
+
+   /* For vfio case : hotunplug/unmap not supported (todo) */
 }

 static void
@@ -1286,6 +1288,16 @@ static int virtio_hw_init_by_vfio(struct virtio_hw *hw,
return -1;
}

+   /*
+* pci_map_device used not to actually map ioport region but
+* create vfio container/group and vfio-dev-fd for _this_
+* virtio interface.
+*/
+   if (rte_eal_pci_map_device(pci_dev) != 0) {
+   PMD_INIT_LOG(ERR, "vfio pci mapping failed for ioport bar\n");
+   return -1;
+   }
+
/* .. So attached interface is vfio */
vdev->is_vfio = true;
vdev->pci_dev = pci_dev;
-- 
1.7.9.5



[dpdk-dev] [PATCH v4 14/14] vfio: Support for no-IOMMU mode

2016-01-14 Thread Santosh Shukla
From: Anatoly Burakov 

This commit is adding a generic mechanism to support multiple IOMMU
types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special
VFIO mode that doesn't use IOMMU at all), but it's easily extended
by adding necessary definitions into eal_pci_init.h and a DMA
mapping function to eal_pci_vfio_dma.c.

Since type 1 IOMMU module is no longer necessary to have VFIO,
we fix the module check to check for vfio-pci instead. It's not
ideal and triggers VFIO checks more often (and thus produces more
error output, which was the reason behind the module check in the
first place), so we compensate for that by providing more verbose
logging, indicating whether VFIO initialization has succeeded or
failed.

Signed-off-by: Anatoly Burakov 
Signed-off-by: Santosh Shukla 
Tested-by: Santosh Shukla 
---
- Pulled Anatoly patch just for testing and other user to use full patchset to
  try-out / experitment for, This patchset works for me :).

lib/librte_eal/linuxapp/eal/Makefile   |1 +
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |   22 
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c |  143 +++-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c |   84 ++
 lib/librte_eal/linuxapp/eal/eal_vfio.h |5 +
 5 files changed, 202 insertions(+), 53 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 26eced5..5c9e9d9 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_dma.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index 3bc592b..068800c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -112,6 +112,28 @@ struct vfio_config {
struct vfio_group vfio_groups[VFIO_MAX_GROUPS];
 };

+/* function pointer typedef for DMA mapping functions */
+typedef  int (*vfio_dma_func_t)(int);
+
+/* Structure to hold supported IOMMU types */
+struct vfio_iommu_type {
+   int type_id;
+   const char *name;
+   vfio_dma_func_t dma_map_func;
+};
+
+/* function prototypes for different IOMMU types */
+int vfio_iommu_type1_dma_map(int container_fd);
+int vfio_iommu_noiommu_dma_map(int container_fd);
+
+/* IOMMU types we support */
+static const struct vfio_iommu_type iommu_types[] = {
+   /* x86 IOMMU, otherwise known as type 1 */
+   { VFIO_TYPE1_IOMMU, "Type 1", &vfio_iommu_type1_dma_map},
+   /* IOMMU-less mode */
+   { VFIO_NOIOMMU_IOMMU, "No-IOMMU", &vfio_iommu_noiommu_dma_map},
+};
+
 #endif

 #endif /* EAL_PCI_INIT_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index df407ef..5c5ccea 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -72,6 +72,7 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)

 /* per-process VFIO config */
@@ -236,42 +237,58 @@ pci_vfio_set_bus_master(int dev_fd)
return 0;
 }

-/* set up DMA mappings */
-static int
-pci_vfio_setup_dma_maps(int vfio_container_fd)
-{
-   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-   int i, ret;
-
-   ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
-   VFIO_TYPE1_IOMMU);
-   if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
-   "error %i (%s)\n", errno, strerror(errno));
-   return -1;
+/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */
+static const struct vfio_iommu_type *
+pci_vfio_set_iommu_type(int vfio_container_fd) {
+   unsigned idx;
+   for (idx = 0; idx < RTE_DIM(iommu_types); idx++) {
+   const struct vfio_iommu_type *t = &iommu_types[idx];
+
+   int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+   t->type_id);
+   if (!ret) {
+   RTE_LOG(NOTICE, EAL, "  using IOMMU type %d (%s)\n",
+   t->type_id, t->name);
+   return t;
+   }
+   /* not an error, there may

[dpdk-dev] [PATCH v2] app/testpmd Fix max_socket detection

2016-01-14 Thread Bruce Richardson
On Wed, Jan 13, 2016 at 02:23:36PM -0800, Stephen Hurd wrote:
> Previously, max_socket was set to the highest numbered socket with
> an enabled lcore.  The intent is to set it to the highest socket
> regardless of it being enabled.
> 

Can you clarify why this changes is necessary? Is it causing a bug somewhere?

thanks,
/Bruce



[dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements

2016-01-14 Thread Xie, Huawei
On 1/6/2016 8:04 PM, Thomas Monjalon wrote:
> 2016-01-05 08:10, Xie, Huawei:
>> On 10/26/2015 10:06 PM, Xie, Huawei wrote:
>>> On 10/19/2015 1:16 PM, Stephen Hemminger wrote:
 This is a tested version of the virtio Tx performance improvements
 that I posted earlier on the list, and described at the DPDK Userspace
 meeting in Dublin. Together they get a 25% performance improvement for
 both small packet and large multi-segment packet case when testing
 from DPDK guest application to Linux KVM host.

 Stephen Hemminger (5):
   virtio: clean up space checks on xmit
   virtio: don't use unlikely for normal tx stuff
   virtio: use indirect ring elements
   virtio: use any layout on transmit
   virtio: optimize transmit enqueue
>>> There is one open why merge-able header is used in tx path. Since old
>>> implementation is also using the merge-able header in tx path if this
>>> feature is negotiated, i choose to ack the patch and address this later
>>> if not now.
>>>
>>> Acked-by: Huawei Xie 
>> Thomas:
>> This patch isn't in the patchwork. Does Stephen need to send a new one?
> Yes please, I cannot find them in patchwork.

ping

>



[dpdk-dev] [PATCH 0/4] Optimize memcpy for AVX512 platforms

2016-01-14 Thread Stephen Hemminger
On Thu, 14 Jan 2016 01:13:18 -0500
Zhihong Wang  wrote:

> This patch set optimizes DPDK memcpy for AVX512 platforms, to make full
> utilization of hardware resources and deliver high performance.
> 
> In current DPDK, memcpy holds a large proportion of execution time in
> libs like Vhost, especially for large packets, and this patch can bring
> considerable benefits.
> 
> The implementation is based on the current DPDK memcpy framework, some
> background introduction can be found in these threads:
> http://dpdk.org/ml/archives/dev/2014-November/008158.html
> http://dpdk.org/ml/archives/dev/2015-January/011800.html
> 
> Code changes are:
> 
>   1. Read CPUID to check if AVX512 is supported by CPU
> 
>   2. Predefine AVX512 macro if AVX512 is enabled by compiler
> 
>   3. Implement AVX512 memcpy and choose the right implementation based on
>  predefined macros
> 
>   4. Decide alignment unit for memcpy perf test based on predefined macros
> 
> Zhihong Wang (4):
>   lib/librte_eal: Identify AVX512 CPU flag
>   mk: Predefine AVX512 macro for compiler
>   lib/librte_eal: Optimize memcpy for AVX512 platforms
>   app/test: Adjust alignment unit for memcpy perf test
> 
>  app/test/test_memcpy_perf.c|   6 +
>  .../common/include/arch/x86/rte_cpuflags.h |   2 +
>  .../common/include/arch/x86/rte_memcpy.h   | 247 
> -
>  mk/rte.cpuflags.mk |   4 +
>  4 files changed, 255 insertions(+), 4 deletions(-)
> 

This really looks like code that could benefit from Gcc
function multiversioning. The current cpuflags model is useless/flawed
in real product deployment


[dpdk-dev] Problem with Intel i40e XL710 dpdk driver

2016-01-14 Thread Karthick, A.R.
Hi,
 I am seeing a "Failed to init adminq: -54" or admin queue timeouts
 while initializing the admin queue for i40e xl710 intel nic.
 (Intel server is a E5-2670)

 First things first.
 I am running the latest firmware.
 The kernel module is not loaded and yes, it works with the i40e kernel
driver. (latest or otherwise)
And this problem comes even with dpdk 2.0/2.1 or the latest stable. So
there's that.

 I have done a bunch of debugging and here are my findings.
 With the card configured in 2x40g or 4x10g mode, it _ALWAYS_ works with
 successfully initializing pci function 0 or port 0.
 It always fails to subsequently initialize the rest.
 Even if unbind the igb uio for port 0 and bind only port 1 or port 2,3,4
in 4x10g mode,
 it fails.

 Since it works with the kernel driver, I tried to see if there were
differences in the way registers are setup for i40e driver in kernel and
dpdk.
 They look mostly to be the same but obviously there were subtle
differences.
 From what I could fathom, I couldn't see much and whatever little was
caught, I tried to keep the dpdk code in sync and it still failed.

 While stepping through gdb all the way from eal pci to pci uio map to
eth_i40e_dev_init,
 to the failure in obtaining the firmware revision for port1 during
i40e_init_adminq,
 I did confirm that the memory map was right for the pci.

 So the hw->hw_addr looks correct for port 1 correlating it to the uio1 map
or the physical address from lspci or kernel driver when using the kernel
driver which works.

 However the admin queue seems to be not processing any request for port 1.
 Note that port 0 always works and its the same code for others with a
different eal dev/hw instance.

But for other ports like port1, after correctly setting up the adminq
registers and memory map,
 it always fails to obtain the firmware revision since the i40e_asq_done is
returning 0 for the
 head register at 0x80300 and doesn't match the next_in_use when starting
at 1.
 So it always returns pending or false in i40e_asq_done which is retried a
certain times after resetting the aq by i40e_init_adminq but ultimately
gives up.

 Thoughts and wondering if you guys have seen this and have a fix or patch
that is not in upstream yet.

 Failure enclosed below as mentioned above in detail: (with a 4x10g mode
for the card but same failure with 2x40g mode as well. No difference. Port
0 always succeeds but subsequent ports fail.
 And same result even with port 0 not bound and starting with the
initialization of port 2,3,4 which always fails.

EAL: lcore 1 is ready (tid=6bd30700;cpuset=[1])
EAL: PCI device :01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL:   Not managed by a supported kernel driver, skipped
EAL: PCI device :01:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1521 rte_igb_pmd
EAL:   Not managed by a supported kernel driver, skipped
EAL: PCI device :83:00.0 on NUMA socket 1
EAL:   probe driver: 8086:1583 rte_i40e_pmd
EAL:   PCI memory mapped at 0x7f2f8000
EAL:   PCI memory mapped at 0x7f2f8080
PMD: eth_i40e_dev_init(): FW 4.40 API 1.4 NVM 04.05.03 eetrack 80001dca
PMD: i40e_pf_parameter_init(): Max supported VSIs:34
PMD: i40e_pf_parameter_init(): PF queue pairs:64
PMD: i40e_pf_parameter_init(): Max VMDQ VSI num:34
PMD: i40e_pf_parameter_init(): VMDQ queue pairs:4
EAL: PCI device :83:00.1 on NUMA socket 1
EAL:   probe driver: 8086:1583 rte_i40e_pmd
EAL:   PCI memory mapped at 0x7f2f80808000
EAL:   PCI memory mapped at 0x7f2f81008000
PMD: eth_i40e_dev_init(): Failed to init adminq: -54
EAL: Error - exiting with code: 1
 Cause: Requested device :83:00.1 cannot be used

Regards,
-Karthick