from:"Ouyang Changchun"

[dpdk-dev] [PATCH] [VIRTIO] Support multiple queues feature in DPDK based virtio-net frontend.

2014-05-16 Thread Ouyang Changchun

This patch support multiple queues feature in DPDK based virtio-net frontend. 
It firstly gets max queue number of virtio-net from virtio pci configuration 
and then send command to negotiate the queue numer with backend;
when receiving and transmiting packets, negotiated multiple virtio-net queues 
can serve that; 
To utilize this featrue, the backend also need support mulitiple queues feature 
and enable it.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 326 --
 lib/librte_pmd_virtio/virtio_ethdev.h |  10 +-
 lib/librte_pmd_virtio/virtio_pci.h|   4 +-
 lib/librte_pmd_virtio/virtio_rxtx.c   |  79 +---
 lib/librte_pmd_virtio/virtqueue.h |  61 +--
 5 files changed, 388 insertions(+), 92 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index c6a1df5..a3616ea 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -80,6 +80,9 @@ static void virtio_dev_stats_get(struct rte_eth_dev *dev, 
struct rte_eth_stats *
 static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);

+static int virtio_dev_queue_stats_mapping_set(__rte_unused struct rte_eth_dev 
*eth_dev,
+__rte_unused uint16_t queue_id, __rte_unused uint8_t stat_idx, __rte_unused 
uint8_t is_rx);
+
 /*
  * The set of PCI devices this driver supports
  */
@@ -91,6 +94,130 @@ static struct rte_pci_id pci_id_virtio_map[] = {
 { .vendor_id = 0, /* sentinel */ },
 };

+static int
+virtio_send_command(struct virtqueue* vq, struct virtio_pmd_ctrl* ctrl,
+   int* dlen, int pkt_num)
+{
+   uint32_t head = vq->vq_desc_head_idx, i;
+   int k, sum = 0;
+   virtio_net_ctrl_ack status = ~0;
+   struct virtio_pmd_ctrl result;
+
+   ctrl->status = status;
+
+   if (!vq->hw->cvq) {
+   PMD_INIT_LOG(ERR, "%s(): Control queue is "
+"not supported by this device.\n", __func__);
+   return -1;
+   }
+
+   PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, 
vq->hw->cvq = %p \n"
+   "vq = %p \n", vq->vq_desc_head_idx, status, vq->hw->cvq, vq);
+
+   if ((vq->vq_free_cnt < ((uint32_t)pkt_num + 2)) || (pkt_num < 1)) {
+   return -1;
+   }
+
+   memcpy(vq->virtio_net_hdr_mz->addr, ctrl, sizeof(struct 
virtio_pmd_ctrl));
+
+   /*
+* Format is enforced in qemu code:
+* One TX packet for header;
+* At least one TX packet per argument;
+* One RX packet for ACK.
+*/
+   vq->vq_ring.desc[head].flags = VRING_DESC_F_NEXT;
+   vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mz->phys_addr;
+   vq->vq_ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
+   vq->vq_free_cnt--;
+   i = vq->vq_ring.desc[head].next;
+
+   for (k = 0; k < pkt_num; k++) {
+   vq->vq_ring.desc[i].flags = VRING_DESC_F_NEXT;
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr +
+   sizeof(struct virtio_net_ctrl_hdr) + 
sizeof(ctrl->status) + sizeof(uint8_t)*sum; 
+   vq->vq_ring.desc[i].len = dlen[k];
+   sum += dlen[k];
+   vq->vq_free_cnt--;
+   i = vq->vq_ring.desc[i].next;
+   }
+
+   vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr + 
sizeof(struct virtio_net_ctrl_hdr);
+   vq->vq_ring.desc[i].len = sizeof(ctrl->status);
+   vq->vq_free_cnt--;
+
+   vq->vq_desc_head_idx = vq->vq_ring.desc[i].next;
+
+   vq_update_avail_ring(vq, head);
+   vq_update_avail_idx(vq);
+
+   PMD_INIT_LOG(DEBUG, "vq->vq_queue_index = %d \n", vq->vq_queue_index);
+
+   virtqueue_notify(vq);
+
+   while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) {
+   usleep(100);
+   }
+
+   while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
+   uint32_t idx, desc_idx, used_idx;
+   struct vring_used_elem *uep;
+
+   rmb();
+
+   used_idx = (uint32_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 
1));
+   uep = &vq->vq_ring.used->ring[used_idx];
+   idx = (uint32_t) uep->id;
+   desc_idx = idx;
+
+   while (vq->vq_ring.desc[desc_idx].flags & VRING_DESC_F_NEXT) {
+   desc_idx = vq->vq_ring.desc[desc_idx].next;
+   vq->vq_free_cnt++;
+   }
+
+   vq->vq_ring.desc[desc_idx].next = vq->vq_desc_head_idx;
+   vq->vq_desc_head_idx = idx;
+
+   vq->vq_used_cons_

[dpdk-dev] [PATCH 0/3] [PMD] [VHOST] * Support zero copy RX/TX in user space vhost *

2014-05-19 Thread Ouyang Changchun

Short summary:
* Add API to support queue start and stop functionality for RX/TX, and 
implement them in IXGBE PMD; 
* Enable hardware loopback functionality in VMDQ mode;
* Implement mbuf metadata macros to facilitate refering to space in mbuf 
headroom;
* Support user space vhost zero copy RX/TX, it removes packets copying between 
host and guest in RX/TX;

Ouyang Changchun (3):
  1. It contains the following 2 parts:
 a) Add API to support queue start and stop functionality for RX/TX, and 
implement them in IXGBE PMD;  
 b) Enable hardware loopback functionality in VMDQ mode;
  2. Implement mbuf metadata macros to facilitate refering to space in mbuf 
headroom;
  3. Support user space vhost zero copy, it removes packets copying between 
host and guest in RX/TX.

 examples/vhost/main.c| 1405 ++
 examples/vhost/virtio-net.c  |  120 ++-
 examples/vhost/virtio-net.h  |   15 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c |2 +-
 lib/librte_ether/rte_ethdev.c|  104 +++
 lib/librte_ether/rte_ethdev.h|   80 ++
 lib/librte_mbuf/rte_mbuf.h   |   17 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c  |4 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h  |8 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c|  233 -
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h|6 +
 11 files changed, 1800 insertions(+), 194 deletions(-)

-- 
1.9.0

[dpdk-dev] [PATCH 3/3] [PMD] [VHOST] Support zero copy RX/TX in user space vhost

2014-05-19 Thread Ouyang Changchun

Support user space vhost zero copy. It removes packets copying between host and 
guest in RX/TX packets.
It introduces an extra ring to store the detached mbufs. At initialization 
stage all mbufs will put into
this ring; when one guest starts, vhost gets the available buffer address 
allocated by guest for RX and 
translates them into host space addresses, then attaches them to mbufs and puts 
the attached mbufs into 
mempool.
Queue starting and DMA refilling will get mbufs from mempool and use them to 
set the DMA addresses.

For TX, it gets the buffer addresses of available packets to be transmitted 
from guest and translates
them to host space addresses, then attaches them to mbufs and puts them to TX 
queues. 
After TX finishes, it pulls mbufs out from mempool, detaches them and puts them 
back into the extra ring.

Signed-off-by: Ouyang Changchun 
---
 examples/vhost/main.c   | 1405 ++-
 examples/vhost/virtio-net.c |  120 +++-
 examples/vhost/virtio-net.h |   15 +-
 3 files changed, 1383 insertions(+), 157 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 816a71a..21704f1 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "main.h"
 #include "virtio-net.h"
@@ -70,6 +71,14 @@
 #define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)

 /*
+ * No frame data buffer allocated from host are required for zero copy 
implementation,
+ * guest will allocate the frame data buffer, and vhost directly use it.
+ */
+#define VIRTIO_DESCRIPTOR_LEN_ZCP 1518
+#define MBUF_SIZE_ZCP (VIRTIO_DESCRIPTOR_LEN_ZCP + sizeof(struct rte_mbuf) + 
RTE_PKTMBUF_HEADROOM)
+#define MBUF_CACHE_SIZE_ZCP 0
+
+/*
  * RX and TX Prefetch, Host, and Write-back threshold values should be
  * carefully set for optimal performance. Consult the network
  * controller's datasheet and supporting DPDK documentation for guidance
@@ -108,6 +117,21 @@
 #define RTE_TEST_RX_DESC_DEFAULT 1024 
 #define RTE_TEST_TX_DESC_DEFAULT 512

+/*
+ * Need refine these 2 macros for legacy and DPDK based front end:
+ * Max vring avail descriptor/entries from guest - MAX_PKT_BURST
+ * And then adjust power 2.
+ */
+/*
+ * For legacy front end, 128 descriptors,
+ * half for virtio header, another half for mbuf.
+ */
+#define RTE_TEST_RX_DESC_DEFAULT_ZCP 32   /* legacy: 32, DPDK virt FE: 128. */
+#define RTE_TEST_TX_DESC_DEFAULT_ZCP 64   /* legacy: 64, DPDK virt FE: 64.  */
+
+/* true if x is a power of 2 */
+#define POWEROF2(x) x)-1) & (x)) == 0)
+
 #define INVALID_PORT_ID 0xFF

 /* Max number of devices. Limited by vmdq. */
@@ -138,8 +162,39 @@ static uint32_t num_switching_cores = 0;
 static uint32_t num_queues = 0;
 uint32_t num_devices = 0;

+/* Enable zero copy, pkts buffer will directly dma to hw descriptor, disabled 
on default*/
+static uint32_t zero_copy = 0;
+
+/* number of descriptors to apply*/
+static uint32_t num_rx_descriptor = RTE_TEST_RX_DESC_DEFAULT_ZCP;
+static uint32_t num_tx_descriptor = RTE_TEST_TX_DESC_DEFAULT_ZCP;
+
+/* max ring descriptor, ixgbe, i40e, e1000 all are 4096. */
+#define MAX_RING_DESC 4096
+
+struct vpool {
+   struct rte_mempool * pool;
+   struct rte_ring * ring;
+   uint32_t buf_size;
+} vpool_array[MAX_QUEUES+MAX_QUEUES];
+
 /* Enable VM2VM communications. If this is disabled then the MAC address 
compare is skipped. */
-static uint32_t enable_vm2vm = 1;
+typedef enum {
+   VM2VM_DISABLED = 0,
+   VM2VM_SOFTWARE = 1,
+   VM2VM_HARDWARE = 2,
+   VM2VM_LAST
+} vm2vm_type;
+static vm2vm_type vm2vm_mode = VM2VM_SOFTWARE;
+
+/* The type of host physical address translated from guest physical address. */
+typedef enum {
+   PHYS_ADDR_CONTINUOUS = 0,
+   PHYS_ADDR_CROSS_SUBREG = 1,
+   PHYS_ADDR_INVALID = 2,
+   PHYS_ADDR_LAST
+} hpa_type;
+
 /* Enable stats. */
 static uint32_t enable_stats = 0;
 /* Enable retries on RX. */
@@ -159,7 +214,7 @@ static uint32_t dev_index = 0;
 extern uint64_t VHOST_FEATURES;

 /* Default configuration for rx and tx thresholds etc. */
-static const struct rte_eth_rxconf rx_conf_default = {
+static struct rte_eth_rxconf rx_conf_default = {
.rx_thresh = {
.pthresh = RX_PTHRESH,
.hthresh = RX_HTHRESH,
@@ -173,7 +228,7 @@ static const struct rte_eth_rxconf rx_conf_default = {
  * Controller and the DPDK ixgbe/igb PMD. Consider using other values for other
  * network controllers and/or network drivers.
  */
-static const struct rte_eth_txconf tx_conf_default = {
+static struct rte_eth_txconf tx_conf_default = {
.tx_thresh = {
.pthresh = TX_PTHRESH,
.hthresh = TX_HTHRESH,
@@ -184,7 +239,7 @@ static const struct rte_eth_txconf tx_conf_default = {
 };

 /* empty vmdq configuration structure. Filled in programatically */
-static const struct rte_eth_conf vmdq_conf_default =

[dpdk-dev] [PATCH 2/3] [PMD] [VHOST] Support zero copy RX/TX in user space vhost

2014-05-19 Thread Ouyang Changchun

Implement mbuf metadata macros to facilitate refering to space in mbuf headroom;

Signed-off-by: Ouyang Changchun 
---
 lib/librte_mbuf/rte_mbuf.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index edffc2c..baf3ca4 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -201,8 +201,25 @@ struct rte_mbuf {
struct rte_ctrlmbuf ctrl;
struct rte_pktmbuf pkt;
};
+
+   union {
+   uint8_t metadata[0];
+   uint16_t metadata16[0];
+   uint32_t metadata32[0];
+   uint64_t metadata64[0];
+   };
 } __rte_cache_aligned;

+#define RTE_MBUF_METADATA_UINT8(mbuf, offset)   (mbuf->metadata[offset])
+#define RTE_MBUF_METADATA_UINT16(mbuf, offset)  
(mbuf->metadata16[offset/sizeof(uint16_t)])
+#define RTE_MBUF_METADATA_UINT32(mbuf, offset)  
(mbuf->metadata32[offset/sizeof(uint32_t)])
+#define RTE_MBUF_METADATA_UINT64(mbuf, offset)  
(mbuf->metadata64[offset/sizeof(uint64_t)])
+
+#define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset)   (&mbuf->metadata[offset])
+#define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset)  
(&mbuf->metadata16[offset/sizeof(uint16_t)])
+#define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset)  
(&mbuf->metadata32[offset/sizeof(uint32_t)])
+#define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset)  
(&mbuf->metadata64[offset/sizeof(uint64_t)])
+
 /**
  * Given the buf_addr returns the pointer to corresponding mbuf.
  */
-- 
1.9.0

[dpdk-dev] [PATCH 1/3] [PMD] [VHOST] Support zero copy RX/TX in user space vhost

2014-05-19 Thread Ouyang Changchun

1. Add API to support queue start and stop functionality for RX/TX, and 
implement them in IXGBE PMD;
2. Enable hardware loopback functionality in VMDQ mode;

Signed-off-by: Ouyang Changchun 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c |   2 +-
 lib/librte_ether/rte_ethdev.c| 104 ++
 lib/librte_ether/rte_ethdev.h|  80 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c  |   4 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h  |   8 ++
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c| 233 ++-
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h|   6 +
 7 files changed, 400 insertions(+), 37 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 69ad63e..dd10e15 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -134,6 +134,7 @@ rte_mem_virt2phy(const void *virtaddr)
uint64_t page, physaddr;
unsigned long virt_pfn;
int page_size;
+   off_t offset;

/* standard page size */
page_size = getpagesize();
@@ -145,7 +146,6 @@ rte_mem_virt2phy(const void *virtaddr)
return RTE_BAD_PHYS_ADDR;
}

-   off_t offset;
virt_pfn = (unsigned long)virtaddr / page_size;
offset = sizeof(uint64_t) * virt_pfn;
if (lseek(fd, offset, SEEK_SET) == (off_t) -1) {
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ec411db..7faeeff 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -293,6 +293,110 @@ rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, 
uint16_t nb_queues)
return (0);
 }

+int
+rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t rx_queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-EINVAL);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   if (rx_queue_id >= dev->data->nb_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", rx_queue_id);
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
+
+   return dev->dev_ops->rx_queue_start(dev, rx_queue_id);
+
+}
+
+int
+rte_eth_dev_rx_queue_stop(uint8_t port_id, uint16_t rx_queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-EINVAL);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   if (rx_queue_id >= dev->data->nb_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", rx_queue_id);
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -ENOTSUP);
+
+   return dev->dev_ops->rx_queue_stop(dev, rx_queue_id);
+
+}
+
+int
+rte_eth_dev_tx_queue_start(uint8_t port_id, uint16_t tx_queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-EINVAL);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   if (tx_queue_id >= dev->data->nb_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", tx_queue_id);
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -ENOTSUP);
+
+   return dev->dev_ops->tx_queue_start(dev, tx_queue_id);
+
+}
+
+int
+rte_eth_dev_tx_queue_stop(uint8_t port_id, uint16_t tx_queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-EINVAL);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   if (tx_queue_id >= dev->data->nb_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", tx_queue_id);
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -ENOTSUP);
+
+   return dev->dev_ops->tx_queue_stop(dev, tx_que

[dpdk-dev] [PATCH] [PMD] [VHOST] Revert unnecessary definition and fix wrong referring in user space vhost zero copy patches

2014-05-19 Thread Ouyang Changchun

 1. Revert the change of metadata macro definition for referring to headroom 
space in mbuf;
 2. Fix wrongly referring to RX queues number in TX queues start/stop function.

Signed-off-by: Ouyang Changchun 
---
 examples/vhost/main.c | 15 +--
 lib/librte_ether/rte_ethdev.c |  8 
 lib/librte_mbuf/rte_mbuf.h| 17 -
 3 files changed, 13 insertions(+), 27 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 21704f1..674608c 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -129,6 +129,9 @@
 #define RTE_TEST_RX_DESC_DEFAULT_ZCP 32   /* legacy: 32, DPDK virt FE: 128. */
 #define RTE_TEST_TX_DESC_DEFAULT_ZCP 64   /* legacy: 64, DPDK virt FE: 64.  */

+/* Get first 4 bytes in mbuf headroom. */
+#define MBUF_HEADROOM_UINT32(mbuf) (*(uint32_t*)((uint8_t*)(mbuf) + 
sizeof(struct rte_mbuf)))
+
 /* true if x is a power of 2 */
 #define POWEROF2(x) x)-1) & (x)) == 0)

@@ -1638,7 +1641,7 @@ attach_rxmbuf_zcp(struct virtio_net *dev)
mbuf->pkt.data = (void*)(uintptr_t)(buff_addr);
mbuf->buf_physaddr = phys_addr - RTE_PKTMBUF_HEADROOM;
mbuf->pkt.data_len = desc->len;
-   RTE_MBUF_METADATA_UINT32(mbuf, 0) = (uint32_t)desc_idx;
+   MBUF_HEADROOM_UINT32(mbuf) = (uint32_t)desc_idx;

LOG_DEBUG(DATA, "(%"PRIu64") in attach_rxmbuf_zcp: res base idx:%d, 
descriptor idx:%d\n",
dev->device_fh, res_base_idx, desc_idx);
@@ -1700,7 +1703,7 @@ txmbuf_clean_zcp(struct virtio_net* dev, struct vpool* 
vpool)
rte_ring_sp_enqueue(vpool->ring, mbuf);

/* Update used index buffer information. */
-   vq->used->ring[used_idx].id = RTE_MBUF_METADATA_UINT32(mbuf, 0);
+   vq->used->ring[used_idx].id = MBUF_HEADROOM_UINT32(mbuf);
vq->used->ring[used_idx].len = 0;

used_idx = (used_idx + 1) & (vq->size - 1);
@@ -1788,7 +1791,7 @@ virtio_dev_rx_zcp(struct virtio_net *dev, struct rte_mbuf 
**pkts, uint32_t count

/* Retrieve all of the head indexes first to avoid caching issues. */
for (head_idx = 0; head_idx < count; head_idx++)
-   head[head_idx] = RTE_MBUF_METADATA_UINT32((pkts[head_idx]), 0);
+   head[head_idx] = MBUF_HEADROOM_UINT32(pkts[head_idx]);

/*Prefetch descriptor index. */
rte_prefetch0(&vq->desc[head[packet_success]]);
@@ -1799,7 +1802,7 @@ virtio_dev_rx_zcp(struct virtio_net *dev, struct rte_mbuf 
**pkts, uint32_t count

buff = pkts[packet_success];
LOG_DEBUG(DATA, "(%"PRIu64") in dev_rx_zcp: update the used idx 
for pkt[%d] descriptor idx: %d\n", 
-   dev->device_fh, packet_success, 
RTE_MBUF_METADATA_UINT32(buff, 0));
+   dev->device_fh, packet_success, 
MBUF_HEADROOM_UINT32(buff));

PRINT_PACKET(dev, 
(uintptr_t)(((uint64_t)(uintptr_t)buff->buf_addr) + RTE_PKTMBUF_HEADROOM), 
rte_pktmbuf_data_len(buff), 0);
@@ -1901,7 +1904,7 @@ virtio_tx_route_zcp(struct virtio_net* dev, struct 
rte_mbuf *m, uint32_t desc_id
if (unlikely(dev_ll->dev->device_fh == 
dev->device_fh)) {
LOG_DEBUG(DATA, "(%"PRIu64") TX: Source 
and destination MAC addresses are the same. Dropping packet.\n",
dev_ll->dev->device_fh);
-   RTE_MBUF_METADATA_UINT32(mbuf, 0) = 
(uint32_t)desc_idx;
+   MBUF_HEADROOM_UINT32(mbuf) = 
(uint32_t)desc_idx;
__rte_mbuf_raw_free(mbuf);
return ;
}
@@ -1936,7 +1939,7 @@ virtio_tx_route_zcp(struct virtio_net* dev, struct 
rte_mbuf *m, uint32_t desc_id
mbuf->pkt.vlan_macip.f.vlan_tci = vlan_tag;
mbuf->pkt.vlan_macip.f.l2_len = sizeof(struct ether_hdr);
mbuf->pkt.vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);
-   RTE_MBUF_METADATA_UINT32(mbuf, 0) = (uint32_t)desc_idx;
+   MBUF_HEADROOM_UINT32(mbuf) = (uint32_t)desc_idx;

tx_q->m_table[len] = mbuf;
len++;
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 7faeeff..0008755 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -360,8 +360,8 @@ rte_eth_dev_tx_queue_start(uint8_t port_id, uint16_t 
tx_queue_id)
}

dev = &rte_eth_devices[port_id];
-   if (tx_queue_id >= dev->data->nb_rx_queues) {
-   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", tx_queue_id);
+   if (tx_queue_id >= dev->data->nb_tx_queues) {
+   PMD_DEBUG_TRACE("Inva

[dpdk-dev] [PATCH] [PMD] [VHOST] Revert unnecessary definition and fix wrong referring in user space vhost zero copy patches

2014-05-20 Thread Ouyang, Changchun

Hi Thomas,

Fine, I will do it.

One more question:

You have comments as follow:
The title was "[PATCH 0/3] [PMD] [VHOST] *** Support zero copy RX/TX in user 
space vhost ***"
It should be "[PATCH v2 0/3] Support zero copy RX/TX in user space vhost"

So "[PMD] [VHOST]" in the title should be removed in the cover letter, right?
And in each separate patch letter, it could use "ixgbe:"  or "examples/vhost:", 
instead of "[PMD] [VHOST]" 
Is it right?


Thanks 
Changchun


-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Tuesday, May 20, 2014 12:00 AM
To: Ouyang, Changchun
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH] [PMD] [VHOST] Revert unnecessary definition and 
fix wrong referring in user space vhost zero copy patches

Hi Changchun,

2014-05-19 23:09, Ouyang Changchun:
> 1. Revert the change of metadata macro definition for referring to 
> headroom space in mbuf; 2. Fix wrongly referring to RX queues number 
> in TX queues start/stop function.
> 
> Signed-off-by: Ouyang Changchun 

You are fixing commits which are not yet applied.
Please merge and re-send the whole serie by suffixing with "v2".

The title was "[PATCH 0/3] [PMD] [VHOST] *** Support zero copy RX/TX in user 
space vhost ***"
It should be "[PATCH v2 0/3] Support zero copy RX/TX in user space vhost"

Other notes:
- please split API and ixgbe changes
- set a significant title to each patch
- use prefixes like "ethdev:", "ixgbe:" or "examples/vhost:"

In general, this page is a good help:
http://dpdk.org/dev#send

Thanks
--
Thomas

[dpdk-dev] [PATCH v2 0/3] Support zero copy RX/TX in user space vhost

2014-05-20 Thread Ouyang Changchun

This patch series support user space vhost zero copy. It removes packets 
copying between host and guest
in RX/TX. And it introduces an extra ring to store the detached mbufs. At 
initialization stage all mbufs
put into this ring; when one guest starts, vhost gets the available buffer 
address allocated by guest
for RX and translates them into host space addresses, then attaches them to 
mbufs and puts the attached
mbufs into mempool.

Queue starting and DMA refilling will get mbufs from mempool and use them to 
set the DMA addresses.

For TX, it gets the buffer addresses of available packets to be transmitted 
from guest and translates
them to host space addresses, then attaches them to mbufs and puts them to TX 
queues.
After TX finishes, it pulls mbufs out from mempool, detaches them and puts them 
back into the extra ring.

This patch series also implement queue start and stop functionality in IXGBE 
PMD; and enable hardware
loopback for VMDQ mode in IXGBE PMD.

Ouyang Changchun (3):
  Add API to support queue start and stop functionality for RX/TX.
  Implement queue start and stop functionality in IXGBE PMD; Enable
hardware loopback for VMDQ mode in IXGBE PMD.
  Support user space vhost zero copy, it removes packets copying between
host and guest in RX/TX.

 examples/vhost/main.c| 1410 ++
 examples/vhost/virtio-net.c  |  120 ++-
 examples/vhost/virtio-net.h  |   15 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c |2 +-
 lib/librte_ether/rte_ethdev.c|  104 +++
 lib/librte_ether/rte_ethdev.h|   80 ++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c  |4 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h  |8 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c|  233 -
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h|6 +
 10 files changed, 1787 insertions(+), 195 deletions(-)

-- 
1.9.0

[dpdk-dev] [PATCH v2 1/3] ethdev: Add API to support queue start and stop functionality for RX/TX.

2014-05-20 Thread Ouyang Changchun

This patch adds API to support queue start and stop functionality for RX/TX.
It allows RX and TX queue is started or stopped one by one, instead of starting
and stopping all of them at the same time. 

Signed-off-by: Ouyang Changchun 
---
 lib/librte_eal/linuxapp/eal/eal_memory.c |   2 +-
 lib/librte_ether/rte_ethdev.c| 104 +++
 lib/librte_ether/rte_ethdev.h|  80 
 3 files changed, 185 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 69ad63e..dd10e15 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -134,6 +134,7 @@ rte_mem_virt2phy(const void *virtaddr)
uint64_t page, physaddr;
unsigned long virt_pfn;
int page_size;
+   off_t offset;

/* standard page size */
page_size = getpagesize();
@@ -145,7 +146,6 @@ rte_mem_virt2phy(const void *virtaddr)
return RTE_BAD_PHYS_ADDR;
}

-   off_t offset;
virt_pfn = (unsigned long)virtaddr / page_size;
offset = sizeof(uint64_t) * virt_pfn;
if (lseek(fd, offset, SEEK_SET) == (off_t) -1) {
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index ec411db..0008755 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -293,6 +293,110 @@ rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, 
uint16_t nb_queues)
return (0);
 }

+int
+rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t rx_queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-EINVAL);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   if (rx_queue_id >= dev->data->nb_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", rx_queue_id);
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
+
+   return dev->dev_ops->rx_queue_start(dev, rx_queue_id);
+
+}
+
+int
+rte_eth_dev_rx_queue_stop(uint8_t port_id, uint16_t rx_queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-EINVAL);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   if (rx_queue_id >= dev->data->nb_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", rx_queue_id);
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -ENOTSUP);
+
+   return dev->dev_ops->rx_queue_stop(dev, rx_queue_id);
+
+}
+
+int
+rte_eth_dev_tx_queue_start(uint8_t port_id, uint16_t tx_queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-EINVAL);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   if (tx_queue_id >= dev->data->nb_tx_queues) {
+   PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", tx_queue_id);
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -ENOTSUP);
+
+   return dev->dev_ops->tx_queue_start(dev, tx_queue_id);
+
+}
+
+int
+rte_eth_dev_tx_queue_stop(uint8_t port_id, uint16_t tx_queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-EINVAL);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   if (tx_queue_id >= dev->data->nb_tx_queues) {
+   PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", tx_queue_id);
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_stop, -ENOTSUP);
+
+   return dev->dev_ops->tx_queue_stop(dev, tx_queue_id);
+
+}
+
 static int
 rte_eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/li

[dpdk-dev] [PATCH v2 3/3] examples/vhost: Support user space vhost zero copy

2014-05-20 Thread Ouyang Changchun

This patch supports user space vhost zero copy. It removes packets copying 
between host and guest in RX/TX.
It introduces an extra ring to store the detached mbufs. At initialization 
stage all mbufs will put into
this ring; when one guest starts, vhost gets the available buffer address 
allocated by guest for RX and
translates them into host space addresses, then attaches them to mbufs and puts 
the attached mbufs into
mempool.
Queue starting and DMA refilling will get mbufs from mempool and use them to 
set the DMA addresses.

For TX, it gets the buffer addresses of available packets to be transmitted 
from guest and translates
them to host space addresses, then attaches them to mbufs and puts them to TX 
queues.
After TX finishes, it pulls mbufs out from mempool, detaches them and puts them 
back into the extra ring.

Signed-off-by: Ouyang Changchun 
---
 examples/vhost/main.c   | 1410 ++-
 examples/vhost/virtio-net.c |  120 +++-
 examples/vhost/virtio-net.h |   15 +-
 3 files changed, 1387 insertions(+), 158 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 816a71a..674608c 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "main.h"
 #include "virtio-net.h"
@@ -70,6 +71,14 @@
 #define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)

 /*
+ * No frame data buffer allocated from host are required for zero copy 
implementation,
+ * guest will allocate the frame data buffer, and vhost directly use it.
+ */
+#define VIRTIO_DESCRIPTOR_LEN_ZCP 1518
+#define MBUF_SIZE_ZCP (VIRTIO_DESCRIPTOR_LEN_ZCP + sizeof(struct rte_mbuf) + 
RTE_PKTMBUF_HEADROOM)
+#define MBUF_CACHE_SIZE_ZCP 0
+
+/*
  * RX and TX Prefetch, Host, and Write-back threshold values should be
  * carefully set for optimal performance. Consult the network
  * controller's datasheet and supporting DPDK documentation for guidance
@@ -108,6 +117,24 @@
 #define RTE_TEST_RX_DESC_DEFAULT 1024 
 #define RTE_TEST_TX_DESC_DEFAULT 512

+/*
+ * Need refine these 2 macros for legacy and DPDK based front end:
+ * Max vring avail descriptor/entries from guest - MAX_PKT_BURST
+ * And then adjust power 2.
+ */
+/*
+ * For legacy front end, 128 descriptors,
+ * half for virtio header, another half for mbuf.
+ */
+#define RTE_TEST_RX_DESC_DEFAULT_ZCP 32   /* legacy: 32, DPDK virt FE: 128. */
+#define RTE_TEST_TX_DESC_DEFAULT_ZCP 64   /* legacy: 64, DPDK virt FE: 64.  */
+
+/* Get first 4 bytes in mbuf headroom. */
+#define MBUF_HEADROOM_UINT32(mbuf) (*(uint32_t*)((uint8_t*)(mbuf) + 
sizeof(struct rte_mbuf)))
+
+/* true if x is a power of 2 */
+#define POWEROF2(x) x)-1) & (x)) == 0)
+
 #define INVALID_PORT_ID 0xFF

 /* Max number of devices. Limited by vmdq. */
@@ -138,8 +165,39 @@ static uint32_t num_switching_cores = 0;
 static uint32_t num_queues = 0;
 uint32_t num_devices = 0;

+/* Enable zero copy, pkts buffer will directly dma to hw descriptor, disabled 
on default*/
+static uint32_t zero_copy = 0;
+
+/* number of descriptors to apply*/
+static uint32_t num_rx_descriptor = RTE_TEST_RX_DESC_DEFAULT_ZCP;
+static uint32_t num_tx_descriptor = RTE_TEST_TX_DESC_DEFAULT_ZCP;
+
+/* max ring descriptor, ixgbe, i40e, e1000 all are 4096. */
+#define MAX_RING_DESC 4096
+
+struct vpool {
+   struct rte_mempool * pool;
+   struct rte_ring * ring;
+   uint32_t buf_size;
+} vpool_array[MAX_QUEUES+MAX_QUEUES];
+
 /* Enable VM2VM communications. If this is disabled then the MAC address 
compare is skipped. */
-static uint32_t enable_vm2vm = 1;
+typedef enum {
+   VM2VM_DISABLED = 0,
+   VM2VM_SOFTWARE = 1,
+   VM2VM_HARDWARE = 2,
+   VM2VM_LAST
+} vm2vm_type;
+static vm2vm_type vm2vm_mode = VM2VM_SOFTWARE;
+
+/* The type of host physical address translated from guest physical address. */
+typedef enum {
+   PHYS_ADDR_CONTINUOUS = 0,
+   PHYS_ADDR_CROSS_SUBREG = 1,
+   PHYS_ADDR_INVALID = 2,
+   PHYS_ADDR_LAST
+} hpa_type;
+
 /* Enable stats. */
 static uint32_t enable_stats = 0;
 /* Enable retries on RX. */
@@ -159,7 +217,7 @@ static uint32_t dev_index = 0;
 extern uint64_t VHOST_FEATURES;

 /* Default configuration for rx and tx thresholds etc. */
-static const struct rte_eth_rxconf rx_conf_default = {
+static struct rte_eth_rxconf rx_conf_default = {
.rx_thresh = {
.pthresh = RX_PTHRESH,
.hthresh = RX_HTHRESH,
@@ -173,7 +231,7 @@ static const struct rte_eth_rxconf rx_conf_default = {
  * Controller and the DPDK ixgbe/igb PMD. Consider using other values for other
  * network controllers and/or network drivers.
  */
-static const struct rte_eth_txconf tx_conf_default = {
+static struct rte_eth_txconf tx_conf_default = {
.tx_thresh = {
.pthresh = TX_PTHRESH,
.hthresh = TX_HTHRESH,
@@ -184,7 +242,7 @@ static const struct rte_eth_txconf tx_conf_de

[dpdk-dev] [PATCH v2 2/3] ixgbe: Implement queue start and stop functionality in IXGBE PMD

2014-05-20 Thread Ouyang Changchun

This patch implements queue start and stop functionality in IXGBE PMD;
it also enables hardware loopback for VMDQ mode in IXGBE PMD.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   4 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   8 ++
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 233 ++--
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
 4 files changed, 215 insertions(+), 36 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 49ff0d1..62a6d77 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -275,6 +275,10 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = {
.vlan_tpid_set= ixgbe_vlan_tpid_set,
.vlan_offload_set = ixgbe_vlan_offload_set,
.vlan_strip_queue_set = ixgbe_vlan_strip_queue_set,
+   .rx_queue_start   = ixgbe_dev_rx_queue_start,
+   .rx_queue_stop= ixgbe_dev_rx_queue_stop,
+   .tx_queue_start   = ixgbe_dev_tx_queue_start,
+   .tx_queue_stop= ixgbe_dev_tx_queue_stop,
.rx_queue_setup   = ixgbe_dev_rx_queue_setup,
.rx_queue_release = ixgbe_dev_rx_queue_release,
.rx_queue_count   = ixgbe_dev_rx_queue_count,
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index 7c6139b..ae52c8e 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -245,6 +245,14 @@ void ixgbe_dev_tx_init(struct rte_eth_dev *dev);

 void ixgbe_dev_rxtx_start(struct rte_eth_dev *dev);

+int ixgbe_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id);
+
+int ixgbe_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id);
+
+int ixgbe_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id);
+
+int ixgbe_dev_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id);
+
 int ixgbevf_dev_rx_init(struct rte_eth_dev *dev);

 void ixgbevf_dev_tx_init(struct rte_eth_dev *dev);
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 55414b9..2a98051 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -1588,7 +1588,7 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts,
  * descriptors should meet the following condition:
  *  (num_ring_desc * sizeof(rx/tx descriptor)) % 128 == 0
  */
-#define IXGBE_MIN_RING_DESC 64
+#define IXGBE_MIN_RING_DESC 32
 #define IXGBE_MAX_RING_DESC 4096

 /*
@@ -1836,6 +1836,7 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
txq->port_id = dev->data->port_id;
txq->txq_flags = tx_conf->txq_flags;
txq->ops = &def_txq_ops;
+   txq->start_tx_per_q= tx_conf->start_tx_per_q;

/*
 * Modification to set VFTDT for virtual function if vf is detected
@@ -2078,6 +2079,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
rxq->crc_len = (uint8_t) ((dev->data->dev_conf.rxmode.hw_strip_crc) ?
0 : ETHER_CRC_LEN);
rxq->drop_en = rx_conf->rx_drop_en;
+   rxq->start_rx_per_q= rx_conf->start_rx_per_q;

/*
 * Allocate RX ring hardware descriptors. A memzone large enough to
@@ -3025,6 +3027,14 @@ ixgbe_vmdq_rx_hw_configure(struct rte_eth_dev *dev)

}

+   /* PFDMA Tx General Switch Control Enables VMDQ loopback */
+   if (cfg->enable_loop_back){
+   IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
+   for(i = 0; i < RTE_IXGBE_VMTXSW_REGISTER_COUNT; i++) {
+   IXGBE_WRITE_REG(hw, IXGBE_VMTXSW(i), UINT32_MAX);
+   }
+   }
+
IXGBE_WRITE_FLUSH(hw);
 }

@@ -3234,7 +3244,6 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
uint32_t rxcsum;
uint16_t buf_size;
uint16_t i;
-   int ret;

PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -3289,11 +3298,6 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
for (i = 0; i < dev->data->nb_rx_queues; i++) {
rxq = dev->data->rx_queues[i];

-   /* Allocate buffers for descriptor rings */
-   ret = ixgbe_alloc_rx_queue_mbufs(rxq);
-   if (ret)
-   return ret;
-
/*
 * Reset crc_len in case it was changed after queue setup by a
 * call to configure.
@@ -3500,10 +3504,8 @@ ixgbe_dev_rxtx_start(struct rte_eth_dev *dev)
struct igb_rx_queue *rxq;
uint32_t txdctl;
uint32_t dmatxctl;
-   uint32_t rxdctl;
uint32_t rxctrl;
uint16_t i;
-   int poll_ms;

PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -3526,55 +3528,214 @@ ixgbe_dev_rxtx_start(struct rte_eth_dev *dev)

[dpdk-dev] [PATCH 2/3] ixgbe: Implement the functionality of administrative link up and down in IXGBE PMD

2014-05-22 Thread Ouyang Changchun

This patch implements the functionality of administrative link up and down in 
IXGBE PMD.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 58 +
 1 file changed, 58 insertions(+)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 76f09af..b6ffad0 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -98,6 +98,8 @@ static int eth_ixgbe_dev_init(struct eth_driver *eth_drv,
 static int  ixgbe_dev_configure(struct rte_eth_dev *dev);
 static int  ixgbe_dev_start(struct rte_eth_dev *dev);
 static void ixgbe_dev_stop(struct rte_eth_dev *dev);
+static int  ixgbe_dev_admin_link_up(struct rte_eth_dev *dev);
+static int  ixgbe_dev_admin_link_down(struct rte_eth_dev *dev);
 static void ixgbe_dev_close(struct rte_eth_dev *dev);
 static void ixgbe_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void ixgbe_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -263,6 +265,8 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = {
.dev_configure= ixgbe_dev_configure,
.dev_start= ixgbe_dev_start,
.dev_stop = ixgbe_dev_stop,
+   .dev_admin_link_up= ixgbe_dev_admin_link_up,
+   .dev_admin_link_down  = ixgbe_dev_admin_link_down,
.dev_close= ixgbe_dev_close,
.promiscuous_enable   = ixgbe_dev_promiscuous_enable,
.promiscuous_disable  = ixgbe_dev_promiscuous_disable,
@@ -1487,6 +1491,60 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
 }

 /*
+ * Link up device administratively: enable tx laser.
+ */
+static int
+ixgbe_dev_admin_link_up(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw =
+   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   if (hw->mac.type == ixgbe_mac_82599EB) {
+#ifdef RTE_NIC_BYPASS
+   if (hw->device_id == IXGBE_DEV_ID_82599_BYPASS) {
+   /* Not suported in bypass mode */
+   PMD_INIT_LOG(ERR, "\nAdmin link up is not supported by 
device id 0x%x\n",
+hw->device_id);
+   return -ENOTSUP;
+   }
+#endif
+   /* Turn on the laser */
+   ixgbe_enable_tx_laser(hw);
+   return 0;
+   }
+
+   PMD_INIT_LOG(ERR, "\nAdmin link up is not supported by device id 
0x%x\n",
+hw->device_id);
+   return -ENOTSUP;
+}
+
+/*
+ * Link down device administratively: disable tx laser.
+ */
+static int
+ixgbe_dev_admin_link_down(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw =
+   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   if (hw->mac.type == ixgbe_mac_82599EB) {
+#ifdef RTE_NIC_BYPASS
+   if (hw->device_id == IXGBE_DEV_ID_82599_BYPASS) {
+   /* Not suported in bypass mode */
+   PMD_INIT_LOG(ERR, "\nAdmin link down is not supported 
by device id 0x%x\n",
+hw->device_id);
+   return -ENOTSUP;
+   }
+#endif
+   /* Turn off the laser */
+   ixgbe_disable_tx_laser(hw);
+   return 0;
+   }
+
+   PMD_INIT_LOG(ERR, "\nAdmin link down is not supported by device id 
0x%x\n",
+hw->device_id);
+   return -ENOTSUP;
+}
+
+/*
  * Reest and stop device.
  */
 static void
-- 
1.9.0

[dpdk-dev] [PATCH 0/3] Support administrative link up and link down

2014-05-22 Thread Ouyang Changchun

This patch series contain the following 3 items:
1. Add API to support administrative link up and down.
2. Implement the functionality of administrative link up and down in IXGBE PMD.
3. Add command in testpmd to test the functionality of administrative link up 
and down of PMD.

Ouyang Changchun (3):
  Add API for supporting administrative link up and down.
  Implement the functionality of administrative link up and down in
IXGBE PMD.
  Add command line to test the functionality of administrative link up
and down of PMD in testpmd.

 app/test-pmd/cmdline.c  | 78 +
 app/test-pmd/testpmd.c  | 14 +++
 app/test-pmd/testpmd.h  |  2 +
 lib/librte_ether/rte_ethdev.c   | 38 ++
 lib/librte_ether/rte_ethdev.h   | 34 
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 58 +++
 6 files changed, 224 insertions(+)

-- 
1.9.0

[dpdk-dev] [PATCH 1/3] ether: Add API to support administrative link up and down

2014-05-22 Thread Ouyang Changchun

This patch addes API to support administrative link up and down.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_ether/rte_ethdev.c | 38 ++
 lib/librte_ether/rte_ethdev.h | 34 ++
 2 files changed, 72 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 0ddedfb..06a0896 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -796,6 +796,44 @@ rte_eth_dev_stop(uint8_t port_id)
(*dev->dev_ops->dev_stop)(dev);
 }

+int
+rte_eth_dev_admin_link_up(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-EINVAL);
+   }
+   dev = &rte_eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_admin_link_up, -ENOTSUP);
+   return (*dev->dev_ops->dev_admin_link_up)(dev);
+}
+
+int
+rte_eth_dev_admin_link_down(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return (-EINVAL);
+   }
+   dev = &rte_eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_admin_link_down, -ENOTSUP);
+   return (*dev->dev_ops->dev_admin_link_down)(dev);
+}
+
 void
 rte_eth_dev_close(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 2be6e4f..d33ff93 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -891,6 +891,12 @@ typedef int  (*eth_dev_start_t)(struct rte_eth_dev *dev);
 typedef void (*eth_dev_stop_t)(struct rte_eth_dev *dev);
 /**< @internal Function used to stop a configured Ethernet device. */

+typedef int  (*eth_dev_admin_link_up_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to link up a configured Ethernet device 
administratively. */
+
+typedef int  (*eth_dev_admin_link_down_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to link down a configured Ethernet device 
administratively. */
+
 typedef void (*eth_dev_close_t)(struct rte_eth_dev *dev);
 /**< @internal Function used to close a configured Ethernet device. */

@@ -1223,6 +1229,8 @@ struct eth_dev_ops {
eth_dev_configure_tdev_configure; /**< Configure device. */
eth_dev_start_tdev_start; /**< Start device. */
eth_dev_stop_t dev_stop;  /**< Stop device. */
+   eth_dev_admin_link_up_tdev_admin_link_up;   /**< Device link up 
administratively. */
+   eth_dev_admin_link_down_t  dev_admin_link_down; /**< device link down 
admininstratively. */
eth_dev_close_tdev_close; /**< Close device. */
eth_promiscuous_enable_t   promiscuous_enable; /**< Promiscuous ON. */
eth_promiscuous_disable_t  promiscuous_disable;/**< Promiscuous OFF. */
@@ -1696,6 +1704,32 @@ extern int rte_eth_dev_start(uint8_t port_id);
  */
 extern void rte_eth_dev_stop(uint8_t port_id);

+
+/**
+ * Link up an Ethernet device administratively.
+ *
+ * The administrative device link up will re-enable the device rx/tx 
functionality
+ * after it is previously administrative device linked down.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @return
+ *   - 0: Success, Ethernet device linked up administratively.
+ *   - <0: Error code of the driver device link up function.
+ */
+extern int rte_eth_dev_admin_link_up(uint8_t port_id);
+
+/**
+ * Link down an Ethernet device administratively.
+ * The device rx/tx functionality will be disabled if success,
+ * and it can be re-enabled with a call to
+ * rte_eth_dev_admin_link_up()
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ */
+extern int rte_eth_dev_admin_link_down(uint8_t port_id);
+
 /**
  * Close an Ethernet device. The device cannot be restarted!
  *
-- 
1.9.0

[dpdk-dev] [PATCH 3/3] testpmd: Add commands to test administrative link up and down of PMD

2014-05-22 Thread Ouyang Changchun

This patch adds commands to test the functionality of administrative link up 
and down of PMD in testpmd.

Signed-off-by: Ouyang Changchun 
---
 app/test-pmd/cmdline.c | 78 ++
 app/test-pmd/testpmd.c | 14 +
 app/test-pmd/testpmd.h |  2 ++
 3 files changed, 94 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 6030192..9dcf475 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -3844,6 +3844,82 @@ cmdline_parse_inst_t cmd_start_tx_first = {
},
 };

+/* *** LINK UP ADMINISTRATIVELY *** */
+struct cmd_admin_link_up_result {
+   cmdline_fixed_string_t admin;
+   cmdline_fixed_string_t link_up;
+   cmdline_fixed_string_t port;
+   uint8_t port_id;
+};
+
+cmdline_parse_token_string_t cmd_admin_link_up_admin =
+   TOKEN_STRING_INITIALIZER(struct cmd_admin_link_up_result, admin, 
"admin");
+cmdline_parse_token_string_t cmd_admin_link_up_link_up =
+   TOKEN_STRING_INITIALIZER(struct cmd_admin_link_up_result, link_up, 
"link-up");
+cmdline_parse_token_string_t cmd_admin_link_up_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_admin_link_up_result, port, "port");
+cmdline_parse_token_num_t cmd_admin_link_up_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_admin_link_up_result, port_id, UINT8);
+
+static void cmd_admin_link_up_parsed(__attribute__((unused)) void 
*parsed_result,
+__attribute__((unused)) struct cmdline *cl,
+__attribute__((unused)) void *data)
+{
+   struct cmd_admin_link_up_result *res = parsed_result;
+   dev_admin_link_up(res->port_id);
+}
+
+cmdline_parse_inst_t cmd_admin_link_up = {
+   .f = cmd_admin_link_up_parsed,
+   .data = NULL,
+   .help_str = "admin link-up port (port id)",
+   .tokens = {
+   (void *)&cmd_admin_link_up_admin,
+   (void *)&cmd_admin_link_up_link_up,
+   (void *)&cmd_admin_link_up_port,
+   (void *)&cmd_admin_link_up_port_id,
+   NULL,
+   },
+};
+
+/* *** LINK DOWN ADMINISTRATIVELY *** */
+struct cmd_admin_link_down_result {
+   cmdline_fixed_string_t admin;
+   cmdline_fixed_string_t link_down;
+   cmdline_fixed_string_t port;
+   uint8_t port_id;
+};
+
+cmdline_parse_token_string_t cmd_admin_link_down_admin =
+   TOKEN_STRING_INITIALIZER(struct cmd_admin_link_down_result, admin, 
"admin");
+cmdline_parse_token_string_t cmd_admin_link_down_link_down =
+   TOKEN_STRING_INITIALIZER(struct cmd_admin_link_down_result, link_down, 
"link-down");
+cmdline_parse_token_string_t cmd_admin_link_down_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_admin_link_down_result, port, 
"port");
+cmdline_parse_token_num_t cmd_admin_link_down_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_admin_link_down_result, port_id, 
UINT8);
+
+static void cmd_admin_link_down_parsed(__attribute__((unused)) void 
*parsed_result,
+__attribute__((unused)) struct cmdline *cl,
+__attribute__((unused)) void *data)
+{
+   struct cmd_admin_link_down_result *res = parsed_result;
+   dev_admin_link_down(res->port_id);
+}
+
+cmdline_parse_inst_t cmd_admin_link_down = {
+   .f = cmd_admin_link_down_parsed,
+   .data = NULL,
+   .help_str = "admin link-down port (port id)",
+   .tokens = {
+   (void *)&cmd_admin_link_down_admin,
+   (void *)&cmd_admin_link_down_link_down,
+   (void *)&cmd_admin_link_down_port,
+   (void *)&cmd_admin_link_down_port_id,
+   NULL,
+   },
+};
+
 /* *** SHOW CFG *** */
 struct cmd_showcfg_result {
cmdline_fixed_string_t show;
@@ -6055,6 +6131,8 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *)&cmd_showcfg,
(cmdline_parse_inst_t *)&cmd_start,
(cmdline_parse_inst_t *)&cmd_start_tx_first,
+   (cmdline_parse_inst_t *)&cmd_admin_link_up,
+   (cmdline_parse_inst_t *)&cmd_admin_link_down,
(cmdline_parse_inst_t *)&cmd_reset,
(cmdline_parse_inst_t *)&cmd_set_numbers,
(cmdline_parse_inst_t *)&cmd_set_txpkts,
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index bc38305..9e9997f 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1208,6 +1208,20 @@ stop_packet_forwarding(void)
test_done = 1;
 }

+void
+dev_admin_link_up(portid_t pid)
+{
+   if (rte_eth_dev_admin_link_up((uint8_t)pid) < 0)
+   printf("\nAdmin link up fail.\n");
+}
+
+void
+dev_admin_link_down(portid_t pid)
+{
+   if (rte_eth_dev_admin_link_down((uint8_t)pid) < 0)
+   printf("\nAdmin link down fail.\n");
+}
+
 static int
 all_ports_started(void)

[dpdk-dev] [PATCH 0/3] Support administrative link up and link down

2014-05-22 Thread Ouyang, Changchun

Hi Ivan
For this one, it seems long story for that...
In short, 
Some customer have such kind of requirement, 
they want to repeatedly start(rte_dev_start) and stop(rte_dev_stop) the port 
for RX and TX, but they find
after several times start and stop, the RX and TX can't work well even the port 
starts,  and the packets error number increase.

To resolve this error number increase issue, and let port work fine even after 
repeatedly start and stop,
We need a new API to do it, after discussing, we have these 2 API, admin link 
up and admin link down.

Any difference if use " dev_link_start/stop" or " dev_link_up/down"? to me, 
admin_link_up/down is better than dev_link_start/stop,

If most people think we need change the name, it is ok to rename it.

I don't think we need it in non-physical PMDs. So no implementation in virtio 
PMD.

Thanks
Changchun

-Original Message-
From: Ivan Boule [mailto:ivan.bo...@6wind.com] 
Sent: Thursday, May 22, 2014 9:17 PM
To: Ouyang, Changchun; dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH 0/3] Support administrative link up and link down

On 05/22/2014 08:11 AM, Ouyang Changchun wrote:
> This patch series contain the following 3 items:
> 1. Add API to support administrative link up and down.
> 2. Implement the functionality of administrative link up and down in IXGBE 
> PMD.
> 3. Add command in testpmd to test the functionality of administrative link up 
> and down of PMD.
>
> Ouyang Changchun (3):
>Add API for supporting administrative link up and down.
>Implement the functionality of administrative link up and down in
>  IXGBE PMD.
>Add command line to test the functionality of administrative link up
>  and down of PMD in testpmd.
>
>   app/test-pmd/cmdline.c  | 78 
> +
>   app/test-pmd/testpmd.c  | 14 +++
>   app/test-pmd/testpmd.h  |  2 +
>   lib/librte_ether/rte_ethdev.c   | 38 ++
>   lib/librte_ether/rte_ethdev.h   | 34 
>   lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 58 +++
>   6 files changed, 224 insertions(+)
>

Hi Changchun,

The 2 functions "rte_eth_dev_admin_link_up" and "rte_eth_dev_admin_link_down"
don't have an equivalent in the Linux kernel, thus I am wondering what is their 
effective usage from a network application perspective.
Could you briefly explain in which use case these functions can be used for?

By the way, it's not completely evident to infer the exact semantics of these 2 
functions from their name.
In particular, I do not see what the term "admin" brings to the understanding 
of their role. If it is to suggest that these functions are intended to force 
the link to a different state of its initial [self-detected] state, then the 
term "force" would be more appropriate.

Otherwise, if eventually these functions appear to be mandatory, I suggest to 
rename them "rte_eth_dev_link_start" and "rte_eth_dev_link_stop" respectively, 
and to apply the same naming conventions in the 2 other patches.

It might also be worth documenting in the comment section of the prototype of 
these 2 functions whether it makes sense or not to support a notion of link 
that can be dynamically started or stopped in non-physical PMDs (vmxnet3, 
virtio, etc).

Regards,
Ivan

--
Ivan Boule
6WIND Development Engineer

[dpdk-dev] [PATCH] [PMD] [VHOST] Revert unnecessary definition and fix wrong referring in user space vhost zero copy patches

2014-05-22 Thread Ouyang, Changchun

Hi, Thomas
Thanks very much for your guiding!
Best regards,
Changchun

-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Thursday, May 22, 2014 11:29 PM
To: Ouyang, Changchun
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH] [PMD] [VHOST] Revert unnecessary definition and 
fix wrong referring in user space vhost zero copy patches

Hi Changchun,

Please, it is preferred to answer below the question.

2014-05-20 01:14, Ouyang, Changchun:
> So "[PMD] [VHOST]" in the title should be removed in the cover letter, 
> right? And in each separate patch letter, it could use "ixgbe:"  or 
> "examples/vhost:", instead of "[PMD] [VHOST]" Is it right?

Yes, you did right in the v2.

Thanks
--
Thomas


> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, May 20, 2014 12:00 AM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] [PMD] [VHOST] Revert unnecessary definition
> and fix wrong referring in user space vhost zero copy patches
> 
> Hi Changchun,
> 
> 2014-05-19 23:09, Ouyang Changchun:
> > 1. Revert the change of metadata macro definition for referring to
> > headroom space in mbuf; 2. Fix wrongly referring to RX queues number
> > in TX queues start/stop function.
> > 
> > Signed-off-by: Ouyang Changchun 
> 
> You are fixing commits which are not yet applied.
> Please merge and re-send the whole serie by suffixing with "v2".
> 
> The title was "[PATCH 0/3] [PMD] [VHOST] *** Support zero copy RX/TX in user
> space vhost ***" It should be "[PATCH v2 0/3] Support zero copy RX/TX in
> user space vhost"
> 
> Other notes:
> - please split API and ixgbe changes
> - set a significant title to each patch
> - use prefixes like "ethdev:", "ixgbe:" or "examples/vhost:"
> 
> In general, this page is a good help:
>   http://dpdk.org/dev#send
> 
> Thanks
> --
> Thomas

[dpdk-dev] [PATCH 0/3] Support setting TX rate for queue and VF

2014-05-22 Thread Ouyang Changchun

This patch series contains the 3 items:
1. Add API to support setting TX rate for a queue or a VF.
2. Implement the functionality of setting TX rate for queue or VF in IXGBE PMD.
3. Add commands in testpmd to test the functionality of setting TX rate for 
queue or VF.

Ouyang Changchun (3):
  Add API to support set TX rate for a queue anf VF.
  Implement the functionality of setting TX rate for queue or VF in
IXGBE PMD.
  Add commands in testpmd to test the functionality of setting TX rate
for queue or VF.

 app/test-pmd/cmdline.c  | 153 
 app/test-pmd/config.c   |  44 +++
 app/test-pmd/testpmd.h  |   2 +
 lib/librte_ether/rte_ethdev.c   |  63 +++
 lib/librte_ether/rte_ethdev.h   |  51 
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 110 ++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  10 ++-
 7 files changed, 432 insertions(+), 1 deletion(-)

-- 
1.9.0

[dpdk-dev] [PATCH 2/3] ixgbe: Implement the functionality of setting TX rate for queue or VF in IXGBE PMD

2014-05-22 Thread Ouyang Changchun

This patch implements the functionality of setting TX rate for queue or VF in 
IXGBE PMD.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 110 
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  10 +++-
 2 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index c9b5fe4..7a61ab0 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -87,6 +87,8 @@
 #define IXGBE_LINK_UP_CHECK_TIMEOUT   1000 /* ms */
 #define IXGBE_VMDQ_NUM_UC_MAC 4096 /* Maximum nb. of UC MAC addr. */

+#define IXGBE_MMW_SIZE_DEFAULT0x4
+#define IXGBE_MMW_SIZE_JUMBO_FRAME0x14

 #define IXGBEVF_PMD_NAME "rte_ixgbevf_pmd" /* PMD name */

@@ -182,6 +184,9 @@ static int ixgbe_mirror_rule_set(struct rte_eth_dev *dev,
 static int ixgbe_mirror_rule_reset(struct rte_eth_dev *dev,
uint8_t rule_id);

+static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t 
queue_idx, uint16_t tx_rate);
+static int ixgbe_set_vf_rate_limit(struct rte_eth_dev *dev, uint16_t vf, 
uint16_t tx_rate, uint64_t q_msk);
+
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -280,6 +285,8 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = {
.set_vf_rx= ixgbe_set_pool_rx,
.set_vf_tx= ixgbe_set_pool_tx,
.set_vf_vlan_filter   = ixgbe_set_pool_vlan_filter,
+   .set_queue_rate_limit = ixgbe_set_queue_rate_limit,
+   .set_vf_rate_limit= ixgbe_set_vf_rate_limit,
.fdir_add_signature_filter= ixgbe_fdir_add_signature_filter,
.fdir_update_signature_filter = ixgbe_fdir_update_signature_filter,
.fdir_remove_signature_filter = ixgbe_fdir_remove_signature_filter,
@@ -1288,10 +1295,13 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 {
struct ixgbe_hw *hw =
IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct ixgbe_vf_info *vfinfo =
+   *IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
int err, link_up = 0, negotiate = 0;
uint32_t speed = 0;
int mask = 0;
int status;
+   uint16_t vf, idx;

PMD_INIT_FUNC_TRACE();

@@ -1408,6 +1418,15 @@ skip_link_setup:
goto error;
}

+   /* Restore vf rate limit */
+   if (vfinfo != NULL) {
+   for (vf = 0; vf < dev->pci_dev->max_vfs; vf++)
+   for (idx = 0; idx < IXGBE_MAX_QUEUE_NUM_PER_VF; idx++)
+   if (vfinfo[vf].tx_rate[idx] != 0)
+   ixgbe_set_vf_rate_limit(dev, vf,
+   vfinfo[vf].tx_rate[idx], 1 << 
idx);
+   }
+
ixgbe_restore_statistics_mapping(dev);

return (0);
@@ -3062,6 +3081,97 @@ ixgbe_mirror_rule_reset(struct rte_eth_dev *dev, uint8_t 
rule_id)
return 0;
 }

+static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t 
queue_idx, uint16_t tx_rate)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t rf_dec, rf_int;
+   uint32_t bcnrc_val;
+   uint16_t link_speed = dev->data->dev_link.link_speed;
+
+   if (queue_idx >= hw->mac.max_tx_queues)
+   return -EINVAL;
+
+   if (tx_rate != 0) {
+   /* Calculate the rate factor values to set */
+   rf_int = (uint32_t)link_speed / (uint32_t)tx_rate;
+   rf_dec = (uint32_t)link_speed % (uint32_t)tx_rate;
+   rf_dec = (rf_dec << IXGBE_RTTBCNRC_RF_INT_SHIFT) / tx_rate;
+
+   bcnrc_val = IXGBE_RTTBCNRC_RS_ENA;
+   bcnrc_val |= ((rf_int << IXGBE_RTTBCNRC_RF_INT_SHIFT) &
+   IXGBE_RTTBCNRC_RF_INT_MASK_M);
+   bcnrc_val |= (rf_dec & IXGBE_RTTBCNRC_RF_DEC_MASK);
+   } else {
+   bcnrc_val = 0;
+   }
+
+   /*
+* Set global transmit compensation time to the MMW_SIZE in RTTBCNRM
+* register. MMW_SIZE=0x014 if 9728-byte jumbo is supported, otherwise 
set as 0x4.
+*/
+   if ((dev->data->dev_conf.rxmode.jumbo_frame == 1) &&
+   (dev->data->dev_conf.rxmode.max_rx_pkt_len >= 
IXGBE_MAX_JUMBO_FRAME_SIZE))
+   IXGBE_WRITE_REG(hw, IXGBE_RTTBCNRM, IXGBE_MMW_SIZE_JUMBO_FRAME);
+   else
+   IXGBE_WRITE_REG(hw, IXGBE_RTTBCNRM, IXGBE_MMW_SIZE_DEFAULT);
+
+   /* Set RTTBCNRC of queue X */
+   IXGBE_WRITE_REG(hw, IXGBE_RTTDQSEL, queue_idx);
+   IXGBE_WRITE_REG(hw, IXGBE_RTTBCNRC, bcnrc_val);
+   IXGBE_WRITE_FLUSH(hw);
+
+   return 0;
+}
+
+static int ixgbe_set_vf_rate_limit(struct rte_eth_dev *dev, uint16_t vf, 
uint16_t tx_rate, uint64_t q_msk)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRI

[dpdk-dev] [PATCH 1/3] ether: Add API to support setting TX rate for queue and VF

2014-05-22 Thread Ouyang Changchun

This patch adds API to support setting TX rate for a queue and a VF.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_ether/rte_ethdev.c | 63 +++
 lib/librte_ether/rte_ethdev.h | 51 +++
 2 files changed, 114 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a5727dd..ff3a9b6 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1913,6 +1913,69 @@ rte_eth_dev_set_vf_vlan_filter(uint8_t port_id, uint16_t 
vlan_id,
vf_mask,vlan_on);
 }

+int rte_eth_set_queue_rate_limit(uint8_t port_id, uint16_t queue_idx, uint16_t 
tx_rate)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info dev_info;
+   struct rte_eth_link link;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("set queue rate limit:invalid port id=%d\n", 
port_id);
+   return (-ENODEV);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   rte_eth_dev_info_get(port_id, &dev_info);
+   link = dev->data->dev_link;
+
+   if (queue_idx > dev_info.max_tx_queues) {
+   PMD_DEBUG_TRACE("set queue rate limit:port %d: invalid queue 
id=%d\n", port_id, queue_idx);
+   return (-EINVAL);
+   }
+
+   if(tx_rate > link.link_speed) {
+   PMD_DEBUG_TRACE("set queue rate limit:invalid tx_rate=%d, 
bigger than link speed= %d\n",
+   tx_rate, link_speed);
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_queue_rate_limit, -ENOTSUP);
+   return (*dev->dev_ops->set_queue_rate_limit)(dev, queue_idx, tx_rate);
+}
+
+int rte_eth_set_vf_rate_limit(uint8_t port_id, uint16_t vf, uint16_t tx_rate, 
uint64_t q_msk)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info dev_info;
+   struct rte_eth_link link;
+
+   if(q_msk == 0)
+   return 0;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("set VF rate limit:invalid port id=%d\n", 
port_id);
+   return (-ENODEV);
+   }
+
+   dev = &rte_eth_devices[port_id];
+   rte_eth_dev_info_get(port_id, &dev_info);
+   link = dev->data->dev_link;
+
+   if (vf > dev_info.max_vfs) {
+   PMD_DEBUG_TRACE("set VF rate limit:port %d: invalid vf 
id=%d\n", port_id, vf);
+   return (-EINVAL);
+   }
+
+   if(tx_rate > link.link_speed) {
+   PMD_DEBUG_TRACE("set VF rate limit:invalid tx_rate=%d, bigger 
than link speed= %d\n",
+   tx_rate, link_speed);
+   return (-EINVAL);
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_rate_limit, -ENOTSUP);
+   return (*dev->dev_ops->set_vf_rate_limit)(dev, vf, tx_rate, q_msk);
+}
+
 int
 rte_eth_mirror_rule_set(uint8_t port_id, 
struct rte_eth_vmdq_mirror_conf *mirror_conf,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index d5ea46b..445d40a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1012,6 +1012,17 @@ typedef int (*eth_set_vf_vlan_filter_t)(struct 
rte_eth_dev *dev,
  uint8_t vlan_on);
 /**< @internal Set VF VLAN pool filter */

+typedef int (*eth_set_queue_rate_limit_t)(struct rte_eth_dev *dev,
+   uint16_t queue_idx,
+   uint16_t tx_rate);
+/**< @internal Set queue TX rate */
+
+typedef int (*eth_set_vf_rate_limit_t)(struct rte_eth_dev *dev,
+   uint16_t vf,
+   uint16_t tx_rate,
+   uint64_t q_msk);
+/**< @internal Set VF TX rate */
+
 typedef int (*eth_mirror_rule_set_t)(struct rte_eth_dev *dev,
  struct rte_eth_vmdq_mirror_conf *mirror_conf,
  uint8_t rule_id, 
@@ -1119,6 +1130,8 @@ struct eth_dev_ops {
eth_set_vf_rx_tset_vf_rx;  /**< enable/disable a VF receive 
*/
eth_set_vf_tx_tset_vf_tx;  /**< enable/disable a VF 
transmit */
eth_set_vf_vlan_filter_t   set_vf_vlan_filter;  /**< Set VF VLAN filter 
*/
+   eth_set_queue_rate_limit_t set_queue_rate_limit;   /**< Set queue rate 
limit */
+   eth_set_vf_rate_limit_tset_vf_rate_limit;   /**< Set VF rate limit 
*/

/** Add a signature filter. */
fdir_add_signature_filter_t fdir_add_signature_filter;
@@ -2561,6 +2574,44 @@ int rte_eth_mirror_rule_reset(uint8_t port_id,
 uint8_t rule_id);

 /**
+ * Set the rate limitation for a queue on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+

[dpdk-dev] [PATCH 3/3] testpmd: Add commands to test the functionality of setting TX rate for queue or VF

2014-05-22 Thread Ouyang Changchun

This patch adds commands in testpmd to test the functionality of setting TX 
rate for queue or VF.

Signed-off-by: Ouyang Changchun 
---
 app/test-pmd/cmdline.c | 153 +
 app/test-pmd/config.c  |  44 ++
 app/test-pmd/testpmd.h |   2 +
 3 files changed, 199 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b3824f9..f85a275 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -343,6 +343,12 @@ static void cmd_help_long_parsed(void *parsed_result,
"MPE:accepts all multicast packets\n\n"
"Enable/Disable a VF receive mode of a port\n\n"

+   "set port (port_id) queue (queue_id) rate (rate_num) \n"
+   "Set rate limit for a queue of a port\n\n"
+
+   "set port (port_id) vf (vf_id) rate (rate_num) 
queue_mask (queue_mask_value)\n"
+   "Set rate limit for queues in VF of a port\n\n"
+
"set port (port_id) mirror-rule (rule_id)" 
"(pool-mirror|vlan-mirror)\n"
" (poolmask|vlanid[,vlanid]*) dst-pool (pool_id) 
(on|off)\n"
@@ -4790,6 +4796,151 @@ cmdline_parse_inst_t cmd_vf_rxvlan_filter = {
},
 };

+/* *** SET RATE LIMIT FOR A QUEUE OF A PORT *** */
+struct cmd_queue_rate_limit_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t port;
+   uint8_t port_num;
+   cmdline_fixed_string_t queue;
+   uint8_t queue_num;
+   cmdline_fixed_string_t rate;
+   uint16_t rate_num;
+};
+
+static void cmd_queue_rate_limit_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_queue_rate_limit_result *res = parsed_result;
+   int ret = 0;
+
+   if ((strcmp(res->set, "set") == 0) && (strcmp(res->port, "port") == 0)
+   && (strcmp(res->queue, "queue") == 0) && (strcmp(res->rate, 
"rate") == 0))
+   ret = set_queue_rate_limit(res->port_num, res->queue_num,
+   res->rate_num);
+   if(ret < 0)
+   printf("queue_rate_limit_cmd error: (%s)\n", strerror(-ret));
+
+}
+
+cmdline_parse_token_string_t cmd_queue_rate_limit_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_queue_rate_limit_result,
+   set,"set");
+cmdline_parse_token_string_t cmd_queue_rate_limit_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_queue_rate_limit_result,
+   port,"port");
+cmdline_parse_token_num_t cmd_queue_rate_limit_portnum =
+   TOKEN_NUM_INITIALIZER(struct cmd_queue_rate_limit_result,
+   port_num, UINT8);
+cmdline_parse_token_string_t cmd_queue_rate_limit_queue =
+   TOKEN_STRING_INITIALIZER(struct cmd_queue_rate_limit_result,
+   queue,"queue");
+cmdline_parse_token_num_t cmd_queue_rate_limit_queuenum =
+   TOKEN_NUM_INITIALIZER(struct cmd_queue_rate_limit_result,
+   queue_num, UINT8);
+cmdline_parse_token_string_t cmd_queue_rate_limit_rate =
+   TOKEN_STRING_INITIALIZER(struct cmd_queue_rate_limit_result,
+   rate,"rate");
+cmdline_parse_token_num_t cmd_queue_rate_limit_ratenum =
+   TOKEN_NUM_INITIALIZER(struct cmd_queue_rate_limit_result,
+   rate_num, UINT16);
+
+cmdline_parse_inst_t cmd_queue_rate_limit = {
+   .f = cmd_queue_rate_limit_parsed,
+   .data = (void *)0,
+   .help_str = "set port X queue Y rate Z:(X = port number,"
+   "Y = queue number,Z = rate number)set rate limit for a queue on port X",
+   .tokens = {
+   (void *)&cmd_queue_rate_limit_set,
+   (void *)&cmd_queue_rate_limit_port,
+   (void *)&cmd_queue_rate_limit_portnum,
+   (void *)&cmd_queue_rate_limit_queue,
+   (void *)&cmd_queue_rate_limit_queuenum,
+   (void *)&cmd_queue_rate_limit_rate,
+   (void *)&cmd_queue_rate_limit_ratenum,
+   NULL,
+   },
+};
+
+
+/* *** SET RATE LIMIT FOR A VF OF A PORT *** */
+struct cmd_vf_rate_limit_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t port;
+   uint8_t port_num;
+   cmdline_fixed_string_t vf;
+   uint8_t vf_num;
+   cmdline_fixed_string_t rate;
+   uint16_t rate_num;
+   cmdline_fixed_string_t q_msk;
+   uint64_t q_msk_val;
+};
+
+static void cmd_vf_rate_limit_parsed(void *parsed_result,
+   __attribute__((unused)) st

[dpdk-dev] [PATCH 0/3] Support administrative link up and link down

2014-05-23 Thread Ouyang, Changchun

Hi Ivan

To some extent, I also agree with you.
But customer hope DPDK can provide an interface like "ifconfig up" and 
"ifconfig down" in linux,
They can invoke such an interface in user application to repeated stop and 
start dev frequently, and
Make sure RX and TX work fine after each start, I think it is not necessary to 
do really device start and stop at
Each time, just need start and stop RX and TX function, so the straightforward 
method is to enable and disable
tx lazer in ixgbe. 
But in the ether level we need a more generic api name, here is 
rte_eth_dev_admin_link_up/down, while enable_tx_laser is not suitable, 
Enable and disable tx laser is a way in ixgbe to fulfill the administrative 
link up and link down.
maybe Fortville and future generation NIC will use other ways to fulfill the 
admin_link_up/down.

Thanks and regards,
Changchun


-Original Message-
From: Ivan Boule [mailto:ivan.bo...@6wind.com] 
Sent: Thursday, May 22, 2014 11:31 PM
To: Ouyang, Changchun; dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH 0/3] Support administrative link up and link down

Hi Changchun,

On 05/22/2014 04:44 PM, Ouyang, Changchun wrote:
> Hi Ivan
> For this one, it seems long story for that...
> In short,
> Some customer have such kind of requirement, they want to repeatedly 
> start(rte_dev_start) and stop(rte_dev_stop) the port for RX and TX, 
> but they find after several times start and stop, the RX and TX can't work 
> well even the port starts,  and the packets error number increase.
>
> To resolve this error number increase issue, and let port work fine 
> even after repeatedly start and stop, We need a new API to do it, after 
> discussing, we have these 2 API, admin link up and admin link down.

If I understand well, this "feature" is not needed by itself, but only as a 
work-around to address issues when repeatedly invoking the functions 
ixgbe_dev_stop and ixgbe_dev_start.
Do such issues appear when performing the same operations with the Linux kernel 
driver?

Anyway, I suppose that such functions have to be automatically invoked by the 
same code of the network application that invokes the functions ixgbe_dev_stop 
and ixgbe_dev_start (said differently, there is no need for a manual assistance 
!)

In that case, would not it be possible - and highly preferable - to directly 
invoke the functions ixgbe_disable_tx_laser and, then, ixgbe_enable_tx_laser 
from the appropriate step during the execution of the function 
ixgbe_dev_start(), waiting for some appropriate delays between the two 
operations, if so needed?

Regards,
Ivan


>
> Any difference if use " dev_link_start/stop" or " dev_link_up/down"? 
> to me, admin_link_up/down is better than dev_link_start/stop,
>
> If most people think we need change the name, it is ok to rename it.
>
> I don't think we need it in non-physical PMDs. So no implementation in virtio 
> PMD.
>
> Thanks
> Changchun
>
>
> -Original Message-
> From: Ivan Boule [mailto:ivan.boule at 6wind.com]
> Sent: Thursday, May 22, 2014 9:17 PM
> To: Ouyang, Changchun; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 0/3] Support administrative link up and 
> link down
>
> On 05/22/2014 08:11 AM, Ouyang Changchun wrote:
>> This patch series contain the following 3 items:
>> 1. Add API to support administrative link up and down.
>> 2. Implement the functionality of administrative link up and down in IXGBE 
>> PMD.
>> 3. Add command in testpmd to test the functionality of administrative link 
>> up and down of PMD.
>>
...

> Hi Changchun,
>
> The 2 functions "rte_eth_dev_admin_link_up" and "rte_eth_dev_admin_link_down"
> don't have an equivalent in the Linux kernel, thus I am wondering what is 
> their effective usage from a network application perspective.
> Could you briefly explain in which use case these functions can be used for?
>
> By the way, it's not completely evident to infer the exact semantics of these 
> 2 functions from their name.
> In particular, I do not see what the term "admin" brings to the understanding 
> of their role. If it is to suggest that these functions are intended to force 
> the link to a different state of its initial [self-detected] state, then the 
> term "force" would be more appropriate.
>
> Otherwise, if eventually these functions appear to be mandatory, I suggest to 
> rename them "rte_eth_dev_link_start" and "rte_eth_dev_link_stop" 
> respectively, and to apply the same naming conventions in the 2 other patches.
>
> It might also be worth documenting in the comment section of the prototype of 
> these 2 functions whether it makes sense or not to support a notion of link 
> that can be dynamically started or stopped in non-physical PMDs (vmxnet3, 
> virtio, etc).


--
Ivan Boule
6WIND Development Engineer

[dpdk-dev] [PATCH v2] virtio: Support multiple queues feature in DPDK based virtio-net frontend

2014-05-23 Thread Ouyang Changchun

This patch supports multiple queues feature in DPDK based virtio-net frontend.
It firstly gets max queue number of virtio-net from virtio PCI configuration and
then send command to negotiate the queue number with backend; When receiving and
transmitting packets, it negotiates multiple virtio-net queues which serve 
RX/TX;
To utilize this feature, the backend also need support multiple queues feature
and enable it.

It also fixes some patch style issues.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 326 --
 lib/librte_pmd_virtio/virtio_ethdev.h |  10 +-
 lib/librte_pmd_virtio/virtio_pci.h|   4 +-
 lib/librte_pmd_virtio/virtio_rxtx.c   |  72 ++--
 lib/librte_pmd_virtio/virtqueue.h |  60 +--
 5 files changed, 384 insertions(+), 88 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 49e236b..79693f4 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -81,6 +81,9 @@ static void virtio_dev_stats_get(struct rte_eth_dev *dev, 
struct rte_eth_stats *
 static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);

+static int virtio_dev_queue_stats_mapping_set(__rte_unused struct rte_eth_dev 
*eth_dev,
+__rte_unused uint16_t queue_id, __rte_unused uint8_t stat_idx, __rte_unused 
uint8_t is_rx);
+
 /*
  * The set of PCI devices this driver supports
  */
@@ -92,6 +95,130 @@ static struct rte_pci_id pci_id_virtio_map[] = {
 { .vendor_id = 0, /* sentinel */ },
 };

+static int
+virtio_send_command(struct virtqueue* vq, struct virtio_pmd_ctrl* ctrl,
+   int* dlen, int pkt_num)
+{
+   uint32_t head = vq->vq_desc_head_idx, i;
+   int k, sum = 0;
+   virtio_net_ctrl_ack status = ~0;
+   struct virtio_pmd_ctrl result;
+
+   ctrl->status = status;
+
+   if (!vq->hw->cvq) {
+   PMD_INIT_LOG(ERR, "%s(): Control queue is "
+"not supported by this device.\n", __func__);
+   return -1;
+   }
+
+   PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, 
vq->hw->cvq = %p \n"
+   "vq = %p \n", vq->vq_desc_head_idx, status, vq->hw->cvq, vq);
+
+   if ((vq->vq_free_cnt < ((uint32_t)pkt_num + 2)) || (pkt_num < 1)) {
+   return -1;
+   }
+
+   memcpy(vq->virtio_net_hdr_mz->addr, ctrl, sizeof(struct 
virtio_pmd_ctrl));
+
+   /*
+* Format is enforced in qemu code:
+* One TX packet for header;
+* At least one TX packet per argument;
+* One RX packet for ACK.
+*/
+   vq->vq_ring.desc[head].flags = VRING_DESC_F_NEXT;
+   vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mz->phys_addr;
+   vq->vq_ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
+   vq->vq_free_cnt--;
+   i = vq->vq_ring.desc[head].next;
+
+   for (k = 0; k < pkt_num; k++) {
+   vq->vq_ring.desc[i].flags = VRING_DESC_F_NEXT;
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr +
+   sizeof(struct virtio_net_ctrl_hdr) + 
sizeof(ctrl->status) + sizeof(uint8_t)*sum;
+   vq->vq_ring.desc[i].len = dlen[k];
+   sum += dlen[k];
+   vq->vq_free_cnt--;
+   i = vq->vq_ring.desc[i].next;
+   }
+
+   vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr + 
sizeof(struct virtio_net_ctrl_hdr);
+   vq->vq_ring.desc[i].len = sizeof(ctrl->status);
+   vq->vq_free_cnt--;
+
+   vq->vq_desc_head_idx = vq->vq_ring.desc[i].next;
+
+   vq_update_avail_ring(vq, head);
+   vq_update_avail_idx(vq);
+
+   PMD_INIT_LOG(DEBUG, "vq->vq_queue_index = %d \n", vq->vq_queue_index);
+
+   virtqueue_notify(vq);
+
+   while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) {
+   usleep(100);
+   }
+
+   while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
+   uint32_t idx, desc_idx, used_idx;
+   struct vring_used_elem *uep;
+
+   rmb();
+
+   used_idx = (uint32_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 
1));
+   uep = &vq->vq_ring.used->ring[used_idx];
+   idx = (uint32_t) uep->id;
+   desc_idx = idx;
+
+   while (vq->vq_ring.desc[desc_idx].flags & VRING_DESC_F_NEXT) {
+   desc_idx = vq->vq_ring.desc[desc_idx].next;
+   vq->vq_free_cnt++;
+   }
+
+   vq->vq_ring.desc[desc_idx].next = vq->vq_desc_head_idx;
+   vq->vq_desc_head

[dpdk-dev] [PATCH v2 1/3] ether: Add API to support setting TX rate for queue and VF

2014-05-26 Thread Ouyang Changchun

This patch adds API to support setting TX rate for a queue and a VF.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_ether/rte_ethdev.c | 71 +++
 lib/librte_ether/rte_ethdev.h | 51 +++
 2 files changed, 122 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a5727dd..1ea61e1 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -1913,6 +1913,77 @@ rte_eth_dev_set_vf_vlan_filter(uint8_t port_id, uint16_t 
vlan_id,
vf_mask,vlan_on);
 }

+int rte_eth_set_queue_rate_limit(uint8_t port_id, uint16_t queue_idx,
+   uint16_t tx_rate)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info dev_info;
+   struct rte_eth_link link;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("set queue rate limit:invalid port id=%d\n",
+   port_id);
+   return -ENODEV;
+   }
+
+   dev = &rte_eth_devices[port_id];
+   rte_eth_dev_info_get(port_id, &dev_info);
+   link = dev->data->dev_link;
+
+   if (queue_idx > dev_info.max_tx_queues) {
+   PMD_DEBUG_TRACE("set queue rate limit:port %d: "
+   "invalid queue id=%d\n", port_id, queue_idx);
+   return -EINVAL;
+   }
+
+   if (tx_rate > link.link_speed) {
+   PMD_DEBUG_TRACE("set queue rate limit:invalid tx_rate=%d, "
+   "bigger than link speed= %d\n",
+   tx_rate, link_speed);
+   return -EINVAL;
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_queue_rate_limit, -ENOTSUP);
+   return (*dev->dev_ops->set_queue_rate_limit)(dev, queue_idx, tx_rate);
+}
+
+int rte_eth_set_vf_rate_limit(uint8_t port_id, uint16_t vf, uint16_t tx_rate,
+   uint64_t q_msk)
+{
+   struct rte_eth_dev *dev;
+   struct rte_eth_dev_info dev_info;
+   struct rte_eth_link link;
+
+   if (q_msk == 0)
+   return 0;
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("set VF rate limit:invalid port id=%d\n",
+   port_id);
+   return -ENODEV;
+   }
+
+   dev = &rte_eth_devices[port_id];
+   rte_eth_dev_info_get(port_id, &dev_info);
+   link = dev->data->dev_link;
+
+   if (vf > dev_info.max_vfs) {
+   PMD_DEBUG_TRACE("set VF rate limit:port %d: "
+   "invalid vf id=%d\n", port_id, vf);
+   return -EINVAL;
+   }
+
+   if (tx_rate > link.link_speed) {
+   PMD_DEBUG_TRACE("set VF rate limit:invalid tx_rate=%d, "
+   "bigger than link speed= %d\n",
+   tx_rate, link_speed);
+   return -EINVAL;
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->set_vf_rate_limit, -ENOTSUP);
+   return (*dev->dev_ops->set_vf_rate_limit)(dev, vf, tx_rate, q_msk);
+}
+
 int
 rte_eth_mirror_rule_set(uint8_t port_id, 
struct rte_eth_vmdq_mirror_conf *mirror_conf,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index d5ea46b..445d40a 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1012,6 +1012,17 @@ typedef int (*eth_set_vf_vlan_filter_t)(struct 
rte_eth_dev *dev,
  uint8_t vlan_on);
 /**< @internal Set VF VLAN pool filter */

+typedef int (*eth_set_queue_rate_limit_t)(struct rte_eth_dev *dev,
+   uint16_t queue_idx,
+   uint16_t tx_rate);
+/**< @internal Set queue TX rate */
+
+typedef int (*eth_set_vf_rate_limit_t)(struct rte_eth_dev *dev,
+   uint16_t vf,
+   uint16_t tx_rate,
+   uint64_t q_msk);
+/**< @internal Set VF TX rate */
+
 typedef int (*eth_mirror_rule_set_t)(struct rte_eth_dev *dev,
  struct rte_eth_vmdq_mirror_conf *mirror_conf,
  uint8_t rule_id, 
@@ -1119,6 +1130,8 @@ struct eth_dev_ops {
eth_set_vf_rx_tset_vf_rx;  /**< enable/disable a VF receive 
*/
eth_set_vf_tx_tset_vf_tx;  /**< enable/disable a VF 
transmit */
eth_set_vf_vlan_filter_t   set_vf_vlan_filter;  /**< Set VF VLAN filter 
*/
+   eth_set_queue_rate_limit_t set_queue_rate_limit;   /**< Set queue rate 
limit */
+   eth_set_vf_rate_limit_tset_vf_rate_limit;   /**< Set VF rate limit 
*/

/** Add a signature filter. */
fdir_add_signature_filter_t f

[dpdk-dev] [PATCH v2 2/3] ixgbe: Implement the functionality of setting TX rate for queue or VF in IXGBE PMD

2014-05-26 Thread Ouyang Changchun

This patch implements the functionality of setting TX rate for queue or VF in 
IXGBE PMD.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 122 
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  13 +++-
 2 files changed, 132 insertions(+), 3 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index c9b5fe4..643477a 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -87,6 +87,8 @@
 #define IXGBE_LINK_UP_CHECK_TIMEOUT   1000 /* ms */
 #define IXGBE_VMDQ_NUM_UC_MAC 4096 /* Maximum nb. of UC MAC addr. */

+#define IXGBE_MMW_SIZE_DEFAULT0x4
+#define IXGBE_MMW_SIZE_JUMBO_FRAME0x14

 #define IXGBEVF_PMD_NAME "rte_ixgbevf_pmd" /* PMD name */

@@ -182,6 +184,10 @@ static int ixgbe_mirror_rule_set(struct rte_eth_dev *dev,
 static int ixgbe_mirror_rule_reset(struct rte_eth_dev *dev,
uint8_t rule_id);

+static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
+   uint16_t queue_idx, uint16_t tx_rate);
+static int ixgbe_set_vf_rate_limit(struct rte_eth_dev *dev, uint16_t vf,
+   uint16_t tx_rate, uint64_t q_msk);
 /*
  * Define VF Stats MACRO for Non "cleared on read" register
  */
@@ -280,6 +286,8 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = {
.set_vf_rx= ixgbe_set_pool_rx,
.set_vf_tx= ixgbe_set_pool_tx,
.set_vf_vlan_filter   = ixgbe_set_pool_vlan_filter,
+   .set_queue_rate_limit = ixgbe_set_queue_rate_limit,
+   .set_vf_rate_limit= ixgbe_set_vf_rate_limit,
.fdir_add_signature_filter= ixgbe_fdir_add_signature_filter,
.fdir_update_signature_filter = ixgbe_fdir_update_signature_filter,
.fdir_remove_signature_filter = ixgbe_fdir_remove_signature_filter,
@@ -1288,10 +1296,13 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 {
struct ixgbe_hw *hw =
IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct ixgbe_vf_info *vfinfo =
+   *IXGBE_DEV_PRIVATE_TO_P_VFDATA(dev->data->dev_private);
int err, link_up = 0, negotiate = 0;
uint32_t speed = 0;
int mask = 0;
int status;
+   uint16_t vf, idx;

PMD_INIT_FUNC_TRACE();

@@ -1408,6 +1419,16 @@ skip_link_setup:
goto error;
}

+   /* Restore vf rate limit */
+   if (vfinfo != NULL) {
+   for (vf = 0; vf < dev->pci_dev->max_vfs; vf++)
+   for (idx = 0; idx < IXGBE_MAX_QUEUE_NUM_PER_VF; idx++)
+   if (vfinfo[vf].tx_rate[idx] != 0)
+   ixgbe_set_vf_rate_limit(dev, vf,
+   vfinfo[vf].tx_rate[idx],
+   1 << idx);
+   }
+
ixgbe_restore_statistics_mapping(dev);

return (0);
@@ -3062,6 +3083,107 @@ ixgbe_mirror_rule_reset(struct rte_eth_dev *dev, 
uint8_t rule_id)
return 0;
 }

+static int ixgbe_set_queue_rate_limit(struct rte_eth_dev *dev,
+   uint16_t queue_idx, uint16_t tx_rate)
+{
+   struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint32_t rf_dec, rf_int;
+   uint32_t bcnrc_val;
+   uint16_t link_speed = dev->data->dev_link.link_speed;
+
+   if (queue_idx >= hw->mac.max_tx_queues)
+   return -EINVAL;
+
+   if (tx_rate != 0) {
+   /* Calculate the rate factor values to set */
+   rf_int = (uint32_t)link_speed / (uint32_t)tx_rate;
+   rf_dec = (uint32_t)link_speed % (uint32_t)tx_rate;
+   rf_dec = (rf_dec << IXGBE_RTTBCNRC_RF_INT_SHIFT) / tx_rate;
+
+   bcnrc_val = IXGBE_RTTBCNRC_RS_ENA;
+   bcnrc_val |= ((rf_int << IXGBE_RTTBCNRC_RF_INT_SHIFT) &
+   IXGBE_RTTBCNRC_RF_INT_MASK_M);
+   bcnrc_val |= (rf_dec & IXGBE_RTTBCNRC_RF_DEC_MASK);
+   } else {
+   bcnrc_val = 0;
+   }
+
+   /*
+* Set global transmit compensation time to the MMW_SIZE in RTTBCNRM
+* register. MMW_SIZE=0x014 if 9728-byte jumbo is supported, otherwise
+* set as 0x4.
+*/
+   if ((dev->data->dev_conf.rxmode.jumbo_frame == 1) &&
+   (dev->data->dev_conf.rxmode.max_rx_pkt_len >=
+   IXGBE_MAX_JUMBO_FRAME_SIZE))
+   IXGBE_WRITE_REG(hw, IXGBE_RTTBCNRM,
+   IXGBE_MMW_SIZE_JUMBO_FRAME);
+   else
+   IXGBE_WRITE_REG(hw, IXGBE_RTTBCNRM,
+   IXGBE_MMW_SIZE_DEFAULT);
+
+   /* Set RTTBCNRC of queue X */
+   IXGBE_WRITE_REG(hw, IXGBE_RTTDQSEL, queue_idx);
+   IXGBE_WRITE_REG(hw, IXGBE_RTTBCNRC, bcnrc_val);
+   IXGBE_WRITE_FLUSH(hw);
+
+

[dpdk-dev] [PATCH v2 3/3] testpmd: Add commands to test the functionality of setting TX rate for queue or VF

2014-05-26 Thread Ouyang Changchun

This patch adds commands in testpmd to test the functionality of setting TX 
rate for queue or VF.

Signed-off-by: Ouyang Changchun 
---
 app/test-pmd/cmdline.c | 159 -
 app/test-pmd/config.c  |  47 +++
 app/test-pmd/testpmd.h |   3 +
 3 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b3824f9..83b2665 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -342,7 +342,14 @@ static void cmd_help_long_parsed(void *parsed_result,
"BAM:accepts broadcast packets;"
"MPE:accepts all multicast packets\n\n"
"Enable/Disable a VF receive mode of a port\n\n"
-   
+
+   "set port (port_id) queue (queue_id) rate (rate_num)\n"
+   "Set rate limit for a queue of a port\n\n"
+
+   "set port (port_id) vf (vf_id) rate (rate_num) "
+   "queue_mask (queue_mask_value)\n"
+   "Set rate limit for queues in VF of a port\n\n"
+
"set port (port_id) mirror-rule (rule_id)" 
"(pool-mirror|vlan-mirror)\n"
" (poolmask|vlanid[,vlanid]*) dst-pool (pool_id) 
(on|off)\n"
@@ -4790,6 +4797,154 @@ cmdline_parse_inst_t cmd_vf_rxvlan_filter = {
},
 };

+/* *** SET RATE LIMIT FOR A QUEUE OF A PORT *** */
+struct cmd_queue_rate_limit_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t port;
+   uint8_t port_num;
+   cmdline_fixed_string_t queue;
+   uint8_t queue_num;
+   cmdline_fixed_string_t rate;
+   uint16_t rate_num;
+};
+
+static void cmd_queue_rate_limit_parsed(void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_queue_rate_limit_result *res = parsed_result;
+   int ret = 0;
+
+   if ((strcmp(res->set, "set") == 0) && (strcmp(res->port, "port") == 0)
+   && (strcmp(res->queue, "queue") == 0)
+   && (strcmp(res->rate, "rate") == 0))
+   ret = set_queue_rate_limit(res->port_num, res->queue_num,
+   res->rate_num);
+   if (ret < 0)
+   printf("queue_rate_limit_cmd error: (%s)\n", strerror(-ret));
+
+}
+
+cmdline_parse_token_string_t cmd_queue_rate_limit_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_queue_rate_limit_result,
+   set, "set");
+cmdline_parse_token_string_t cmd_queue_rate_limit_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_queue_rate_limit_result,
+   port, "port");
+cmdline_parse_token_num_t cmd_queue_rate_limit_portnum =
+   TOKEN_NUM_INITIALIZER(struct cmd_queue_rate_limit_result,
+   port_num, UINT8);
+cmdline_parse_token_string_t cmd_queue_rate_limit_queue =
+   TOKEN_STRING_INITIALIZER(struct cmd_queue_rate_limit_result,
+   queue, "queue");
+cmdline_parse_token_num_t cmd_queue_rate_limit_queuenum =
+   TOKEN_NUM_INITIALIZER(struct cmd_queue_rate_limit_result,
+   queue_num, UINT8);
+cmdline_parse_token_string_t cmd_queue_rate_limit_rate =
+   TOKEN_STRING_INITIALIZER(struct cmd_queue_rate_limit_result,
+   rate, "rate");
+cmdline_parse_token_num_t cmd_queue_rate_limit_ratenum =
+   TOKEN_NUM_INITIALIZER(struct cmd_queue_rate_limit_result,
+   rate_num, UINT16);
+
+cmdline_parse_inst_t cmd_queue_rate_limit = {
+   .f = cmd_queue_rate_limit_parsed,
+   .data = (void *)0,
+   .help_str = "set port X queue Y rate Z:(X = port number,"
+   "Y = queue number,Z = rate number)set rate limit for a queue on port X",
+   .tokens = {
+   (void *)&cmd_queue_rate_limit_set,
+   (void *)&cmd_queue_rate_limit_port,
+   (void *)&cmd_queue_rate_limit_portnum,
+   (void *)&cmd_queue_rate_limit_queue,
+   (void *)&cmd_queue_rate_limit_queuenum,
+   (void *)&cmd_queue_rate_limit_rate,
+   (void *)&cmd_queue_rate_limit_ratenum,
+   NULL,
+   },
+};
+
+
+/* *** SET RATE LIMIT FOR A VF OF A PORT *** */
+struct cmd_vf_rate_limit_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t port;
+   uint8_t port_num;
+   cmdline_fixed_string_t vf;
+   uint8_t vf_num;
+   cmdline_fixed_string_t rate;
+   uint16_t rate

[dpdk-dev] [PATCH v2 0/3] Support setting TX rate for queue and VF

2014-05-26 Thread Ouyang Changchun

This patch v2 fixes some errors and warnings reported by checkpatch.pl.

This patch series also contain the 3 items:
1. Add API to support setting TX rate for a queue or a VF.
2. Implement the functionality of setting TX rate for queue or VF in IXGBE PMD.
3. Add commands in testpmd to test the functionality of setting TX rate for 
queue or VF.

Ouyang Changchun (3):
  Add API to support set TX rate for a queue and VF.
  Implement the functionality of setting TX rate for queue or VF in
IXGBE PMD.
  Add commands in testpmd to test the functionality of setting TX rate
for queue or VF.

 app/test-pmd/cmdline.c  | 159 +++-
 app/test-pmd/config.c   |  47 +++
 app/test-pmd/testpmd.h  |   3 +
 lib/librte_ether/rte_ethdev.c   |  71 
 lib/librte_ether/rte_ethdev.h   |  51 
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 122 +++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  13 ++-
 7 files changed, 462 insertions(+), 4 deletions(-)

-- 
1.9.0

[dpdk-dev] [PATCH v3] virtio: Support multiple queues feature in DPDK based virtio-net frontend.

2014-05-26 Thread Ouyang Changchun

This v3 patch continues fixing some errors and warnings reported by 
checkpatch.pl.

This patch supports multiple queues feature in DPDK based virtio-net frontend.
It firstly gets max queue number of virtio-net from virtio PCI configuration and
then send command to negotiate the queue number with backend; When receiving and
transmitting packets, it negotiates multiple virtio-net queues which serve 
RX/TX;
To utilize this feature, the backend also need support multiple queues feature
and enable it.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 377 --
 lib/librte_pmd_virtio/virtio_ethdev.h |  40 ++--
 lib/librte_pmd_virtio/virtio_pci.h|   4 +-
 lib/librte_pmd_virtio/virtio_rxtx.c   |  92 +++--
 lib/librte_pmd_virtio/virtqueue.h |  61 --
 5 files changed, 458 insertions(+), 116 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 49e236b..c2b4dfb 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -81,6 +81,12 @@ static void virtio_dev_stats_get(struct rte_eth_dev *dev, 
struct rte_eth_stats *
 static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);

+static int virtio_dev_queue_stats_mapping_set(
+   __rte_unused struct rte_eth_dev *eth_dev,
+   __rte_unused uint16_t queue_id,
+   __rte_unused uint8_t stat_idx,
+   __rte_unused uint8_t is_rx);
+
 /*
  * The set of PCI devices this driver supports
  */
@@ -92,6 +98,135 @@ static struct rte_pci_id pci_id_virtio_map[] = {
 { .vendor_id = 0, /* sentinel */ },
 };

+static int
+virtio_send_command(struct virtqueue *vq, struct virtio_pmd_ctrl *ctrl,
+   int *dlen, int pkt_num)
+{
+   uint32_t head = vq->vq_desc_head_idx, i;
+   int k, sum = 0;
+   virtio_net_ctrl_ack status = ~0;
+   struct virtio_pmd_ctrl result;
+
+   ctrl->status = status;
+
+   if (!vq->hw->cvq) {
+   PMD_INIT_LOG(ERR, "%s(): Control queue is "
+   "not supported by this device.\n", __func__);
+   return -1;
+   }
+
+   PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, "
+   "vq->hw->cvq = %p vq = %p\n",
+   vq->vq_desc_head_idx, status, vq->hw->cvq, vq);
+
+   if ((vq->vq_free_cnt < ((uint32_t)pkt_num + 2)) || (pkt_num < 1))
+   return -1;
+
+   memcpy(vq->virtio_net_hdr_mz->addr, ctrl,
+   sizeof(struct virtio_pmd_ctrl));
+
+   /*
+* Format is enforced in qemu code:
+* One TX packet for header;
+* At least one TX packet per argument;
+* One RX packet for ACK.
+*/
+   vq->vq_ring.desc[head].flags = VRING_DESC_F_NEXT;
+   vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mz->phys_addr;
+   vq->vq_ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
+   vq->vq_free_cnt--;
+   i = vq->vq_ring.desc[head].next;
+
+   for (k = 0; k < pkt_num; k++) {
+   vq->vq_ring.desc[i].flags = VRING_DESC_F_NEXT;
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+   + sizeof(struct virtio_net_ctrl_hdr)
+   + sizeof(ctrl->status) + sizeof(uint8_t)*sum;
+   vq->vq_ring.desc[i].len = dlen[k];
+   sum += dlen[k];
+   vq->vq_free_cnt--;
+   i = vq->vq_ring.desc[i].next;
+   }
+
+   vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+   + sizeof(struct virtio_net_ctrl_hdr);
+   vq->vq_ring.desc[i].len = sizeof(ctrl->status);
+   vq->vq_free_cnt--;
+
+   vq->vq_desc_head_idx = vq->vq_ring.desc[i].next;
+
+   vq_update_avail_ring(vq, head);
+   vq_update_avail_idx(vq);
+
+   PMD_INIT_LOG(DEBUG, "vq->vq_queue_index = %d\n", vq->vq_queue_index);
+
+   virtqueue_notify(vq);
+
+   while (vq->vq_used_cons_idx == vq->vq_ring.used->idx)
+   usleep(100);
+
+   while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
+   uint32_t idx, desc_idx, used_idx;
+   struct vring_used_elem *uep;
+
+   rmb();
+
+   used_idx = (uint32_t)(vq->vq_used_cons_idx
+   & (vq->vq_nentries - 1));
+   uep = &vq->vq_ring.used->ring[used_idx];
+   idx = (uint32_t) uep->id;
+   desc_idx = idx;
+
+   while (vq->vq_ring.desc[desc_idx].flags & VRING_DESC_F_NEXT) {
+   desc_idx = vq->vq_ring.desc[desc_idx].next;
+   v

[dpdk-dev] [PATCH v2 0/3] Support zero copy RX/TX in user space vhost

2014-05-28 Thread Ouyang, Changchun

Yes I will send out a patch v3 to replace the patch v2.
Thanks
Changchun

-Original Message-
From: Thomas Monjalon [mailto:thomas.monja...@6wind.com] 
Sent: Wednesday, May 28, 2014 7:02 AM
To: Ouyang, Changchun
Cc: dev at dpdk.org
Subject: Re: [PATCH v2 0/3] Support zero copy RX/TX in user space vhost

Hi,

checkpatch.pl is reporting some errors and I think some of them should avoided.
Please check it.

Thanks
-- 
Thomas

[dpdk-dev] [PATCH 0/3] Support administrative link up and link down

2014-05-28 Thread Ouyang, Changchun

Hi Ivan,
Thanks very much for your detailed response for this issue,
I think your recommendation makes sense, and I will update the naming and 
re-send a patch for link-up and link-down.

Best regards,
Changchun

-Original Message-
From: Ivan Boule [mailto:ivan.bo...@6wind.com] 
Sent: Friday, May 23, 2014 5:25 PM
To: Ouyang, Changchun; dev at dpdk.org
Subject: Re: [dpdk-dev] [PATCH 0/3] Support administrative link up and link down

On 05/23/2014 04:08 AM, Ouyang, Changchun wrote:
> Hi Ivan
>
> To some extent, I also agree with you.
> But customer hope DPDK can provide an interface like "ifconfig up" and 
> "ifconfig down" in linux, They can invoke such an interface in user 
> application to repeated stop and start dev frequently, and Make sure 
> RX and TX work fine after each start, I think it is not necessary to 
> do really device start and stop at Each time, just need start and stop RX and 
> TX function, so the straightforward method is to enable and disable tx lazer 
> in ixgbe.
> But in the ether level we need a more generic api name, here is 
> rte_eth_dev_admin_link_up/down, while enable_tx_laser is not suitable, Enable 
> and disable tx laser is a way in ixgbe to fulfill the administrative link up 
> and link down.
> maybe Fortville and future generation NIC will use other ways to fulfill the 
> admin_link_up/down.
>

Hi Changchun,

I do not understand what your customer effectively needs.
First of all, if I understand well, your customer's application does not really 
need to invoke the DPDK functions "eth_dev_stop" and "eth_dev_start" for 
addressing its problem, for instance to reconfigure RX/TX queues of the port.
When considering the implementation in the ixgbe PMD of the function 
"rte_eth_dev_admin_link_down", its only visible effects from the DPDK 
application perspective is that no input packet can be received anymore, and 
output packets cannot be transmitted (once having filled the TX queues).

Conversely, the only visible effect of the "rte_eth_dev_admin_link_up"
function is that input packets are received again, and that output packets can 
be successfully transmitted.

In fact, by disabling the TX laser on a ixgbe port, the only interesting effect 
of the function "rte_eth_dev_admin_link_down" is that it notifies the peer 
system of a hardware link DOWN event (with no physical link unplug on the peer 
side).
Conversely, by enabling the TX laser on a ixgbe port, the only interesting 
effect of the function "rte_eth_dev_admin_link_up" is that it notifies the peer 
system of a hardware link UP event.

Is that the actions that your customer's application actually needs to perform? 
If so, then this certainly deserves a real operational use case that it is 
worth describing in the patch log.
This would help DPDK PMD implementors to understand what such functions can be 
used for, and to decide whether they actually need to be supported by the PMD.

Assuming that these 2 functions need to be provided to address the issue 
described above, I do not think that the word "admin" brings anything for 
understanding their role. In fact, the word "admin" rather suggests a pure 
"software" down/up setting, instead of a physical one.
Naming these 2 functions "rte_eth_dev_set_link_down"
and "rte_eth_dev_set_link_up" better describes their expected effect.

Regards,
Ivan

>
> On 05/22/2014 04:44 PM, Ouyang, Changchun wrote:
>> Hi Ivan
>> For this one, it seems long story for that...
>> In short,
>> Some customer have such kind of requirement, they want to repeatedly
>> start(rte_dev_start) and stop(rte_dev_stop) the port for RX and TX, 
>> but they find after several times start and stop, the RX and TX can't work 
>> well even the port starts,  and the packets error number increase.
>>
>> To resolve this error number increase issue, and let port work fine 
>> even after repeatedly start and stop, We need a new API to do it, after 
>> discussing, we have these 2 API, admin link up and admin link down.
>
> If I understand well, this "feature" is not needed by itself, but only as a 
> work-around to address issues when repeatedly invoking the functions 
> ixgbe_dev_stop and ixgbe_dev_start.
> Do such issues appear when performing the same operations with the Linux 
> kernel driver?
>
> Anyway, I suppose that such functions have to be automatically invoked 
> by the same code of the network application that invokes the functions 
> ixgbe_dev_stop and ixgbe_dev_start (said differently, there is no need 
> for a manual assistance !)
>
> In that case, would not it be possible - and highly preferable - to directly 
> invoke the functions ixgbe_disable_tx_laser and, then, ixgbe_enable_

[dpdk-dev] [PATCH v2 2/3] ixgbe: Implement the functionality of setting link up and down in IXGBE PMD

2014-05-28 Thread Ouyang Changchun

Please ignore the previous v1 patch, just apply this v2 patch.

This patch implements the functionality of setting link up and down in IXGBE 
PMD.
It is implemented by enabling or disabling TX laser.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 63 +
 1 file changed, 63 insertions(+)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index c9b5fe4..8f9c97a 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -97,6 +97,8 @@ static int eth_ixgbe_dev_init(struct eth_driver *eth_drv,
 static int  ixgbe_dev_configure(struct rte_eth_dev *dev);
 static int  ixgbe_dev_start(struct rte_eth_dev *dev);
 static void ixgbe_dev_stop(struct rte_eth_dev *dev);
+static int  ixgbe_dev_set_link_up(struct rte_eth_dev *dev);
+static int  ixgbe_dev_set_link_down(struct rte_eth_dev *dev);
 static void ixgbe_dev_close(struct rte_eth_dev *dev);
 static void ixgbe_dev_promiscuous_enable(struct rte_eth_dev *dev);
 static void ixgbe_dev_promiscuous_disable(struct rte_eth_dev *dev);
@@ -246,6 +248,8 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = {
.dev_configure= ixgbe_dev_configure,
.dev_start= ixgbe_dev_start,
.dev_stop = ixgbe_dev_stop,
+   .dev_set_link_up= ixgbe_dev_set_link_up,
+   .dev_set_link_down  = ixgbe_dev_set_link_down,
.dev_close= ixgbe_dev_close,
.promiscuous_enable   = ixgbe_dev_promiscuous_enable,
.promiscuous_disable  = ixgbe_dev_promiscuous_disable,
@@ -1458,6 +1462,65 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
 }

 /*
+ * Set device link up: enable tx laser.
+ */
+static int
+ixgbe_dev_set_link_up(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw =
+   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   if (hw->mac.type == ixgbe_mac_82599EB) {
+#ifdef RTE_NIC_BYPASS
+   if (hw->device_id == IXGBE_DEV_ID_82599_BYPASS) {
+   /* Not suported in bypass mode */
+   PMD_INIT_LOG(ERR,
+   "\nSet link up is not supported "
+   "by device id 0x%x\n",
+   hw->device_id);
+   return -ENOTSUP;
+   }
+#endif
+   /* Turn on the laser */
+   ixgbe_enable_tx_laser(hw);
+   return 0;
+   }
+
+   PMD_INIT_LOG(ERR, "\nSet link up is not supported by device id 0x%x\n",
+   hw->device_id);
+   return -ENOTSUP;
+}
+
+/*
+ * Set device link down: disable tx laser.
+ */
+static int
+ixgbe_dev_set_link_down(struct rte_eth_dev *dev)
+{
+   struct ixgbe_hw *hw =
+   IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   if (hw->mac.type == ixgbe_mac_82599EB) {
+#ifdef RTE_NIC_BYPASS
+   if (hw->device_id == IXGBE_DEV_ID_82599_BYPASS) {
+   /* Not suported in bypass mode */
+   PMD_INIT_LOG(ERR,
+   "\nSet link down is not supported "
+   "by device id 0x%x\n",
+hw->device_id);
+   return -ENOTSUP;
+   }
+#endif
+   /* Turn off the laser */
+   ixgbe_disable_tx_laser(hw);
+   return 0;
+   }
+
+   PMD_INIT_LOG(ERR,
+   "\nSet link down is not supported by device id 0x%x\n",
+hw->device_id);
+   return -ENOTSUP;
+}
+
+/*
  * Reest and stop device.
  */
 static void
-- 
1.9.0

[dpdk-dev] [PATCH v2 3/3] testpmd: Add commands to test link up and down of PMD

2014-05-28 Thread Ouyang Changchun

Please ignore previous patch v1, and just apply this patch v2.

This patch adds commands to test the functionality of setting link up and down 
of PMD in testpmd.

Signed-off-by: Ouyang Changchun 
---
 app/test-pmd/cmdline.c | 81 ++
 app/test-pmd/testpmd.c | 14 +
 app/test-pmd/testpmd.h |  2 ++
 3 files changed, 97 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b3824f9..29bf5b5 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -3780,6 +3780,85 @@ cmdline_parse_inst_t cmd_start_tx_first = {
},
 };

+/* *** SET LINK UP *** */
+struct cmd_set_link_up_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t link_up;
+   cmdline_fixed_string_t port;
+   uint8_t port_id;
+};
+
+cmdline_parse_token_string_t cmd_set_link_up_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_link_up_result, set, "set");
+cmdline_parse_token_string_t cmd_set_link_up_link_up =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_link_up_result, link_up,
+   "link-up");
+cmdline_parse_token_string_t cmd_set_link_up_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_link_up_result, port, "port");
+cmdline_parse_token_num_t cmd_set_link_up_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_link_up_result, port_id, UINT8);
+
+static void cmd_set_link_up_parsed(__attribute__((unused)) void *parsed_result,
+__attribute__((unused)) struct cmdline *cl,
+__attribute__((unused)) void *data)
+{
+   struct cmd_set_link_up_result *res = parsed_result;
+   dev_set_link_up(res->port_id);
+}
+
+cmdline_parse_inst_t cmd_set_link_up = {
+   .f = cmd_set_link_up_parsed,
+   .data = NULL,
+   .help_str = "set link-up port (port id)",
+   .tokens = {
+   (void *)&cmd_set_link_up_set,
+   (void *)&cmd_set_link_up_link_up,
+   (void *)&cmd_set_link_up_port,
+   (void *)&cmd_set_link_up_port_id,
+   NULL,
+   },
+};
+
+/* *** SET LINK DOWN *** */
+struct cmd_set_link_down_result {
+   cmdline_fixed_string_t set;
+   cmdline_fixed_string_t link_down;
+   cmdline_fixed_string_t port;
+   uint8_t port_id;
+};
+
+cmdline_parse_token_string_t cmd_set_link_down_set =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_link_down_result, set, "set");
+cmdline_parse_token_string_t cmd_set_link_down_link_down =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_link_down_result, link_down,
+   "link-down");
+cmdline_parse_token_string_t cmd_set_link_down_port =
+   TOKEN_STRING_INITIALIZER(struct cmd_set_link_down_result, port, "port");
+cmdline_parse_token_num_t cmd_set_link_down_port_id =
+   TOKEN_NUM_INITIALIZER(struct cmd_set_link_down_result, port_id, UINT8);
+
+static void cmd_set_link_down_parsed(
+   __attribute__((unused)) void *parsed_result,
+   __attribute__((unused)) struct cmdline *cl,
+   __attribute__((unused)) void *data)
+{
+   struct cmd_set_link_down_result *res = parsed_result;
+   dev_set_link_down(res->port_id);
+}
+
+cmdline_parse_inst_t cmd_set_link_down = {
+   .f = cmd_set_link_down_parsed,
+   .data = NULL,
+   .help_str = "set link-down port (port id)",
+   .tokens = {
+   (void *)&cmd_set_link_down_set,
+   (void *)&cmd_set_link_down_link_down,
+   (void *)&cmd_set_link_down_port,
+   (void *)&cmd_set_link_down_port_id,
+   NULL,
+   },
+};
+
 /* *** SHOW CFG *** */
 struct cmd_showcfg_result {
cmdline_fixed_string_t show;
@@ -5164,6 +5243,8 @@ cmdline_parse_ctx_t main_ctx[] = {
(cmdline_parse_inst_t *)&cmd_showcfg,
(cmdline_parse_inst_t *)&cmd_start,
(cmdline_parse_inst_t *)&cmd_start_tx_first,
+   (cmdline_parse_inst_t *)&cmd_set_link_up,
+   (cmdline_parse_inst_t *)&cmd_set_link_down,
(cmdline_parse_inst_t *)&cmd_reset,
(cmdline_parse_inst_t *)&cmd_set_numbers,
(cmdline_parse_inst_t *)&cmd_set_txpkts,
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index bc38305..8f20fda 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1208,6 +1208,20 @@ stop_packet_forwarding(void)
test_done = 1;
 }

+void
+dev_set_link_up(portid_t pid)
+{
+   if (rte_eth_dev_set_link_up((uint8_t)pid) < 0)
+   printf("\nSet link up fail.\n");
+}
+
+void
+dev_set_link_down(portid_t pid)
+{
+   if (rte_eth_dev_set_link_down((uint8_t)pid) < 0)
+   printf("\nSet link down fail.\n");
+}
+
 static int
 all_ports_started(void)

[dpdk-dev] [PATCH v2 0/3] Support setting link up and link down

2014-05-28 Thread Ouyang Changchun

Please ignore the previous patch series with subject: "Support administrative 
link up and link down"
This v2 patch series will replace the previous patch series.  

This patch series contain the following 3 items:
1. Add API to support setting link up and down, it can be used to repeatedly 
stop and restart
RX/TX of a port without re-allocating resources for the port and re-configuring 
the port.
2. Implement the functionality of setting link up and down in IXGBE PMD.
3. Add command in testpmd to test the functionality of setting link up and down 
of PMD.

Ouyang Changchun (3):
  Add API to support set link up and link down.
  Implement the functionality of setting link up and link down in IXGBE
PMD.
  Add command line to test the functionality of setting link up and link
down in testpmd.

 app/test-pmd/cmdline.c  | 81 +
 app/test-pmd/testpmd.c  | 14 +++
 app/test-pmd/testpmd.h  |  2 +
 lib/librte_ether/rte_ethdev.c   | 38 +
 lib/librte_ether/rte_ethdev.h   | 34 
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 63 +
 6 files changed, 232 insertions(+)

-- 
1.9.0

[dpdk-dev] [PATCH v2 1/3] ether: Add API to support set link up and link down

2014-05-28 Thread Ouyang Changchun

Please ignore previous v1 patch, just use this v2 patch.

This patch adds API to support the functionality of setting link up and down.
It can be used to repeatedly stop and restart RX/TX of a port without 
re-allocating
resources for the port and re-configuring the port.

Signed-off-by: Ouyang Changchun 
---
 lib/librte_ether/rte_ethdev.c | 38 ++
 lib/librte_ether/rte_ethdev.h | 34 ++
 2 files changed, 72 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a5727dd..97e3f9d 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -691,6 +691,44 @@ rte_eth_dev_stop(uint8_t port_id)
(*dev->dev_ops->dev_stop)(dev);
 }

+int
+rte_eth_dev_set_link_up(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+   dev = &rte_eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_set_link_up, -ENOTSUP);
+   return (*dev->dev_ops->dev_set_link_up)(dev);
+}
+
+int
+rte_eth_dev_set_link_down(uint8_t port_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+   dev = &rte_eth_devices[port_id];
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_set_link_down, -ENOTSUP);
+   return (*dev->dev_ops->dev_set_link_down)(dev);
+}
+
 void
 rte_eth_dev_close(uint8_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index d5ea46b..84f2e9f 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -823,6 +823,12 @@ typedef int  (*eth_dev_start_t)(struct rte_eth_dev *dev);
 typedef void (*eth_dev_stop_t)(struct rte_eth_dev *dev);
 /**< @internal Function used to stop a configured Ethernet device. */

+typedef int  (*eth_dev_set_link_up_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to link up a configured Ethernet device. */
+
+typedef int  (*eth_dev_set_link_down_t)(struct rte_eth_dev *dev);
+/**< @internal Function used to link down a configured Ethernet device. */
+
 typedef void (*eth_dev_close_t)(struct rte_eth_dev *dev);
 /**< @internal Function used to close a configured Ethernet device. */

@@ -1084,6 +1090,8 @@ struct eth_dev_ops {
eth_dev_configure_tdev_configure; /**< Configure device. */
eth_dev_start_tdev_start; /**< Start device. */
eth_dev_stop_t dev_stop;  /**< Stop device. */
+   eth_dev_set_link_up_t  dev_set_link_up;   /**< Device link up. */
+   eth_dev_set_link_down_tdev_set_link_down; /**< Device link down. */
eth_dev_close_tdev_close; /**< Close device. */
eth_promiscuous_enable_t   promiscuous_enable; /**< Promiscuous ON. */
eth_promiscuous_disable_t  promiscuous_disable;/**< Promiscuous OFF. */
@@ -1475,6 +1483,32 @@ extern int rte_eth_dev_start(uint8_t port_id);
  */
 extern void rte_eth_dev_stop(uint8_t port_id);

+
+/**
+ * Link up an Ethernet device.
+ *
+ * Set device link up will re-enable the device rx/tx
+ * functionality after it is previously set device linked down.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @return
+ *   - 0: Success, Ethernet device linked up.
+ *   - <0: Error code of the driver device link up function.
+ */
+extern int rte_eth_dev_set_link_up(uint8_t port_id);
+
+/**
+ * Link down an Ethernet device.
+ * The device rx/tx functionality will be disabled if success,
+ * and it can be re-enabled with a call to
+ * rte_eth_dev_set_link_up()
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ */
+extern int rte_eth_dev_set_link_down(uint8_t port_id);
+
 /**
  * Close an Ethernet device. The device cannot be restarted!
  *
-- 
1.9.0

[dpdk-dev] [PATCH v3 2/3] ixgbe: Implement queue start and stop functionality in IXGBE PMD

2014-05-28 Thread Ouyang Changchun

Please ignore previous patch v1 and v2, only need this patch v3 for the queue 
start and stop functionality.

This patch implements queue start and stop functionality in IXGBE PMD;
it also enable hardware loopback for VMDQ mode in IXGBE PMD.

Signed-off-by: Ouyang Changchun 
Tested-by: Waterman Cao 
 This patch passed L2 Forward , L3 Forward testing base on commit: 
57f0ba5f8b8588dfa6ffcd001447ef6337afa6cd.
 See test environment information as the following:
 Fedora 19 , Linux Kernel 3.9.0, GCC 4.8.2 X68_64, Intel Xeon processor E5-2600 
and E5-2600 v2 family
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c |   4 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |   8 ++
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   | 239 ++--
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h   |   6 +
 4 files changed, 220 insertions(+), 37 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index c9b5fe4..3dcff78 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -260,6 +260,10 @@ static struct eth_dev_ops ixgbe_eth_dev_ops = {
.vlan_tpid_set= ixgbe_vlan_tpid_set,
.vlan_offload_set = ixgbe_vlan_offload_set,
.vlan_strip_queue_set = ixgbe_vlan_strip_queue_set,
+   .rx_queue_start   = ixgbe_dev_rx_queue_start,
+   .rx_queue_stop= ixgbe_dev_rx_queue_stop,
+   .tx_queue_start   = ixgbe_dev_tx_queue_start,
+   .tx_queue_stop= ixgbe_dev_tx_queue_stop,
.rx_queue_setup   = ixgbe_dev_rx_queue_setup,
.rx_queue_release = ixgbe_dev_rx_queue_release,
.rx_queue_count   = ixgbe_dev_rx_queue_count,
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index 9d7e93f..1471942 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -212,6 +212,14 @@ void ixgbe_dev_tx_init(struct rte_eth_dev *dev);

 void ixgbe_dev_rxtx_start(struct rte_eth_dev *dev);

+int ixgbe_dev_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queue_id);
+
+int ixgbe_dev_rx_queue_stop(struct rte_eth_dev *dev, uint16_t rx_queue_id);
+
+int ixgbe_dev_tx_queue_start(struct rte_eth_dev *dev, uint16_t tx_queue_id);
+
+int ixgbe_dev_tx_queue_stop(struct rte_eth_dev *dev, uint16_t tx_queue_id);
+
 int ixgbevf_dev_rx_init(struct rte_eth_dev *dev);

 void ixgbevf_dev_tx_init(struct rte_eth_dev *dev);
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 37d02aa..54ca010 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -1588,7 +1588,7 @@ ixgbe_recv_scattered_pkts(void *rx_queue, struct rte_mbuf 
**rx_pkts,
  * descriptors should meet the following condition:
  *  (num_ring_desc * sizeof(rx/tx descriptor)) % 128 == 0
  */
-#define IXGBE_MIN_RING_DESC 64
+#define IXGBE_MIN_RING_DESC 32
 #define IXGBE_MAX_RING_DESC 4096

 /*
@@ -1836,6 +1836,7 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
txq->port_id = dev->data->port_id;
txq->txq_flags = tx_conf->txq_flags;
txq->ops = &def_txq_ops;
+   txq->start_tx_per_q = tx_conf->start_tx_per_q;

/*
 * Modification to set VFTDT for virtual function if vf is detected
@@ -2078,6 +2079,7 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
rxq->crc_len = (uint8_t) ((dev->data->dev_conf.rxmode.hw_strip_crc) ?
0 : ETHER_CRC_LEN);
rxq->drop_en = rx_conf->rx_drop_en;
+   rxq->start_rx_per_q = rx_conf->start_rx_per_q;

/*
 * Allocate RX ring hardware descriptors. A memzone large enough to
@@ -3025,6 +3027,13 @@ ixgbe_vmdq_rx_hw_configure(struct rte_eth_dev *dev)

}

+   /* PFDMA Tx General Switch Control Enables VMDQ loopback */
+   if (cfg->enable_loop_back) {
+   IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
+   for (i = 0; i < RTE_IXGBE_VMTXSW_REGISTER_COUNT; i++)
+   IXGBE_WRITE_REG(hw, IXGBE_VMTXSW(i), UINT32_MAX);
+   }
+
IXGBE_WRITE_FLUSH(hw);
 }

@@ -3234,7 +3243,6 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
uint32_t rxcsum;
uint16_t buf_size;
uint16_t i;
-   int ret;

PMD_INIT_FUNC_TRACE();
hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
@@ -3289,11 +3297,6 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev)
for (i = 0; i < dev->data->nb_rx_queues; i++) {
rxq = dev->data->rx_queues[i];

-   /* Allocate buffers for descriptor rings */
-   ret = ixgbe_alloc_rx_queue_mbufs(rxq);
-   if (ret)
-   return ret;
-
/*
 * Reset crc_len in case it was changed after queue setup by a
 * call to configure.
@@ -3500,10 +3503,8 @@ ix

[dpdk-dev] [PATCH v3 3/3] examples/vhost: Support user space vhost zero copy

2014-05-28 Thread Ouyang Changchun

Please ignore previous patch v1 and v2, only need this patch v3 for us vhost 
zero copy.

This patch supports user space vhost zero copy. It removes packets copying 
between host and guest in RX/TX.
It introduces an extra ring to store the detached mbufs. At initialization 
stage all mbufs will put into
this ring; when one guest starts, vhost gets the available buffer address 
allocated by guest for RX and
translates them into host space addresses, then attaches them to mbufs and puts 
the attached mbufs into
mempool.
Queue starting and DMA refilling will get mbufs from mempool and use them to 
set the DMA addresses.

For TX, it gets the buffer addresses of available packets to be transmitted 
from guest and translates
them to host space addresses, then attaches them to mbufs and puts them to TX 
queues.
After TX finishes, it pulls mbufs out from mempool, detaches them and puts them 
back into the extra ring.

Signed-off-by: Ouyang Changchun 
Tested-by: Waterman Cao 
 This patch passed L2 Forward , L3 Forward testing base on commit: 
57f0ba5f8b8588dfa6ffcd001447ef6337afa6cd.
 See test environment information as the following:
 Fedora 19 , Linux Kernel 3.9.0, GCC 4.8.2 X68_64, Intel Xeon processor E5-2600 
and E5-2600 v2 family
---
 examples/vhost/main.c   | 1476 +--
 examples/vhost/virtio-net.c |  186 +-
 examples/vhost/virtio-net.h |   23 +-
 3 files changed, 1623 insertions(+), 62 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index b86d57d..e91 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "main.h"
 #include "virtio-net.h"
@@ -70,6 +71,16 @@
 #define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM)

 /*
+ * No frame data buffer allocated from host are required for zero copy
+ * implementation, guest will allocate the frame data buffer, and vhost
+ * directly use it.
+ */
+#define VIRTIO_DESCRIPTOR_LEN_ZCP 1518
+#define MBUF_SIZE_ZCP (VIRTIO_DESCRIPTOR_LEN_ZCP + sizeof(struct rte_mbuf) \
+   + RTE_PKTMBUF_HEADROOM)
+#define MBUF_CACHE_SIZE_ZCP 0
+
+/*
  * RX and TX Prefetch, Host, and Write-back threshold values should be
  * carefully set for optimal performance. Consult the network
  * controller's datasheet and supporting DPDK documentation for guidance
@@ -108,6 +119,25 @@
 #define RTE_TEST_RX_DESC_DEFAULT 1024 
 #define RTE_TEST_TX_DESC_DEFAULT 512

+/*
+ * Need refine these 2 macros for legacy and DPDK based front end:
+ * Max vring avail descriptor/entries from guest - MAX_PKT_BURST
+ * And then adjust power 2.
+ */
+/*
+ * For legacy front end, 128 descriptors,
+ * half for virtio header, another half for mbuf.
+ */
+#define RTE_TEST_RX_DESC_DEFAULT_ZCP 32   /* legacy: 32, DPDK virt FE: 128. */
+#define RTE_TEST_TX_DESC_DEFAULT_ZCP 64   /* legacy: 64, DPDK virt FE: 64.  */
+
+/* Get first 4 bytes in mbuf headroom. */
+#define MBUF_HEADROOM_UINT32(mbuf) (*(uint32_t *)((uint8_t *)(mbuf) \
+   + sizeof(struct rte_mbuf)))
+
+/* true if x is a power of 2 */
+#define POWEROF2(x) x)-1) & (x)) == 0)
+
 #define INVALID_PORT_ID 0xFF

 /* Max number of devices. Limited by vmdq. */
@@ -138,8 +168,42 @@ static uint32_t num_switching_cores = 0;
 static uint32_t num_queues = 0;
 uint32_t num_devices = 0;

+/*
+ * Enable zero copy, pkts buffer will directly dma to hw descriptor,
+ * disabled on default.
+ */
+static uint32_t zero_copy;
+
+/* number of descriptors to apply*/
+static uint32_t num_rx_descriptor = RTE_TEST_RX_DESC_DEFAULT_ZCP;
+static uint32_t num_tx_descriptor = RTE_TEST_TX_DESC_DEFAULT_ZCP;
+
+/* max ring descriptor, ixgbe, i40e, e1000 all are 4096. */
+#define MAX_RING_DESC 4096
+
+struct vpool {
+   struct rte_mempool *pool;
+   struct rte_ring *ring;
+   uint32_t buf_size;
+} vpool_array[MAX_QUEUES+MAX_QUEUES];
+
 /* Enable VM2VM communications. If this is disabled then the MAC address 
compare is skipped. */
-static uint32_t enable_vm2vm = 1;
+typedef enum {
+   VM2VM_DISABLED = 0,
+   VM2VM_SOFTWARE = 1,
+   VM2VM_HARDWARE = 2,
+   VM2VM_LAST
+} vm2vm_type;
+static vm2vm_type vm2vm_mode = VM2VM_SOFTWARE;
+
+/* The type of host physical address translated from guest physical address. */
+typedef enum {
+   PHYS_ADDR_CONTINUOUS = 0,
+   PHYS_ADDR_CROSS_SUBREG = 1,
+   PHYS_ADDR_INVALID = 2,
+   PHYS_ADDR_LAST
+} hpa_type;
+
 /* Enable stats. */
 static uint32_t enable_stats = 0;
 /* Enable retries on RX. */
@@ -159,7 +223,7 @@ static uint32_t dev_index = 0;
 extern uint64_t VHOST_FEATURES;

 /* Default configuration for rx and tx thresholds etc. */
-static const struct rte_eth_rxconf rx_conf_default = {
+static struct rte_eth_rxconf rx_conf_default = {
.rx_thresh = {
.pthresh = RX_PTHRESH,
.hthresh = RX_HTHRESH,
@@ -173,7 +237,7 @@ static const struct rte_eth_rxconf rx_conf_defau

[dpdk-dev] [PATCH v3 1/3] ethdev: Add API to support queue start and stop functionality for RX/TX.

2014-05-28 Thread Ouyang Changchun

Please ignore previous patch v1, v2, just need apply this patch v3 for new API 
code changes.

This patch adds API to support queue start and stop functionality for RX/TX.
It allows RX and TX queue is started or stopped one by one, instead of starting
and stopping all of them at the same time.

Signed-off-by: Ouyang Changchun 
Tested-by: Waterman Cao 
 This patch passed L2 Forward , L3 Forward testing base on commit: 
57f0ba5f8b8588dfa6ffcd001447ef6337afa6cd.
 See test environment information as the following:
 Fedora 19 , Linux Kernel 3.9.0, GCC 4.8.2 X68_64, Intel Xeon processor E5-2600 
and E5-2600 v2 family
---
 lib/librte_eal/linuxapp/eal/eal_memory.c |   2 +-
 lib/librte_ether/rte_ethdev.c| 104 +++
 lib/librte_ether/rte_ethdev.h|  80 
 3 files changed, 185 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 5a10a80..8d1edd9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -134,6 +134,7 @@ rte_mem_virt2phy(const void *virtaddr)
uint64_t page, physaddr;
unsigned long virt_pfn;
int page_size;
+   off_t offset;

/* standard page size */
page_size = getpagesize();
@@ -145,7 +146,6 @@ rte_mem_virt2phy(const void *virtaddr)
return RTE_BAD_PHYS_ADDR;
}

-   off_t offset;
virt_pfn = (unsigned long)virtaddr / page_size;
offset = sizeof(uint64_t) * virt_pfn;
if (lseek(fd, offset, SEEK_SET) == (off_t) -1) {
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index a5727dd..df7cb07 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -292,6 +292,110 @@ rte_eth_dev_rx_queue_config(struct rte_eth_dev *dev, 
uint16_t nb_queues)
return (0);
 }

+int
+rte_eth_dev_rx_queue_start(uint8_t port_id, uint16_t rx_queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+
+   dev = &rte_eth_devices[port_id];
+   if (rx_queue_id >= dev->data->nb_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", rx_queue_id);
+   return -EINVAL;
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_start, -ENOTSUP);
+
+   return dev->dev_ops->rx_queue_start(dev, rx_queue_id);
+
+}
+
+int
+rte_eth_dev_rx_queue_stop(uint8_t port_id, uint16_t rx_queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+
+   dev = &rte_eth_devices[port_id];
+   if (rx_queue_id >= dev->data->nb_rx_queues) {
+   PMD_DEBUG_TRACE("Invalid RX queue_id=%d\n", rx_queue_id);
+   return -EINVAL;
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_stop, -ENOTSUP);
+
+   return dev->dev_ops->rx_queue_stop(dev, rx_queue_id);
+
+}
+
+int
+rte_eth_dev_tx_queue_start(uint8_t port_id, uint16_t tx_queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+
+   dev = &rte_eth_devices[port_id];
+   if (tx_queue_id >= dev->data->nb_tx_queues) {
+   PMD_DEBUG_TRACE("Invalid TX queue_id=%d\n", tx_queue_id);
+   return -EINVAL;
+   }
+
+   FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_start, -ENOTSUP);
+
+   return dev->dev_ops->tx_queue_start(dev, tx_queue_id);
+
+}
+
+int
+rte_eth_dev_tx_queue_stop(uint8_t port_id, uint16_t tx_queue_id)
+{
+   struct rte_eth_dev *dev;
+
+   /* This function is only safe when called from the primary process
+* in a multi-process setup*/
+   PROC_PRIMARY_OR_ERR_RET(-E_RTE_SECONDARY);
+
+   if (port_id >= nb_ports) {
+   PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+   return -EINVAL;
+   }
+
+   dev = &rte_eth_devices[port_id];
+   if (tx_queue_id >= dev->data->nb_tx_queues) {
+   PMD_DEBUG_TRACE("Invalid

[dpdk-dev] [PATCH v3 0/3] Support zero copy RX/TX in user space vhost

2014-05-28 Thread Ouyang Changchun

This patch v3 fixes some errors and warnings reported by checkpatch.pl,
please ignore previous 2 patches: patch v1 and patch v2, only apply this v3 
patch for
zero copy RX/TX in user space vhost.

This patch series support user space vhost zero copy. It removes packets 
copying between host and guest
in RX/TX. And it introduces an extra ring to store the detached mbufs. At 
initialization stage all mbufs
put into this ring; when one guest starts, vhost gets the available buffer 
address allocated by guest
for RX and translates them into host space addresses, then attaches them to 
mbufs and puts the attached
mbufs into mempool.

Queue starting and DMA refilling will get mbufs from mempool and use them to 
set the DMA addresses.

For TX, it gets the buffer addresses of available packets to be transmitted 
from guest and translates
them to host space addresses, then attaches them to mbufs and puts them to TX 
queues.
After TX finishes, it pulls mbufs out from mempool, detaches them and puts them 
back into the extra ring.

This patch series also implement queue start and stop functionality in IXGBE 
PMD; and enable hardware
loopback for VMDQ mode in IXGBE PMD.

Ouyang Changchun (3):
  Add API to support queue start and stop functionality for RX/TX.
  Implement queue start and stop functionality in IXGBE PMD; Enable
hardware loopback for VMDQ mode in IXGBE PMD.
  Support user space vhost zero copy, it removes packets copying between
host and guest in RX/TX.

 examples/vhost/main.c| 1476 --
 examples/vhost/virtio-net.c  |  186 +++-
 examples/vhost/virtio-net.h  |   23 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c |2 +-
 lib/librte_ether/rte_ethdev.c|  104 +++
 lib/librte_ether/rte_ethdev.h|   80 ++
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c  |4 +
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h  |8 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c|  239 -
 lib/librte_pmd_ixgbe/ixgbe_rxtx.h|6 +
 10 files changed, 2028 insertions(+), 100 deletions(-)

-- 
1.9.0

[dpdk-dev] [PATCH v4 1/2] virtio: Cleanup the existing codes in virtio-net PMD

2014-05-29 Thread Ouyang Changchun

This patch cleanups some coding style issue, and fixes some errors and warnings
reported by checkpatch.pl.

Signed-off-by: Ouyang Changchun 
Tested-by: Waterman Cao 
This patch passed Testpmd testing base on commit:  
57f0ba5f8b8588dfa6ffcd001447ef6337afa6cd.
See test environment information as the following:
Fedora 20, Linux kernel 3.13.9, GCC 4.8.2 X86_64, Intel Xeon processor E5-2600 
and E5-2600 v2 family
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 68 +++
 lib/librte_pmd_virtio/virtio_ethdev.h | 30 
 lib/librte_pmd_virtio/virtio_rxtx.c   | 43 --
 lib/librte_pmd_virtio/virtqueue.h | 26 --
 4 files changed, 100 insertions(+), 67 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 49e236b..685bf90 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -134,7 +134,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,

if (queue_type == VTNET_RQ) {
rte_snprintf(vq_name, sizeof(vq_name), "port%d_rvq%d",
-   dev->data->port_id, queue_idx);
+   dev->data->port_id, queue_idx);
vq = rte_zmalloc(vq_name, sizeof(struct virtqueue) +
vq_size * sizeof(struct vq_desc_extra), 
CACHE_LINE_SIZE);
memcpy(vq->vq_name, vq_name, sizeof(vq->vq_name));
@@ -146,8 +146,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
memcpy(vq->vq_name, vq_name, sizeof(vq->vq_name));
} else if(queue_type == VTNET_CQ) {
rte_snprintf(vq_name, sizeof(vq_name), "port%d_cvq",
-   dev->data->port_id);
vq = rte_zmalloc(vq_name, sizeof(struct virtqueue),
+   dev->data->port_id);
CACHE_LINE_SIZE);
memcpy(vq->vq_name, vq_name, sizeof(vq->vq_name));
}
@@ -155,6 +155,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
PMD_INIT_LOG(ERR, "%s: Can not allocate virtqueue\n", __func__);
return (-ENOMEM); 
}
+
vq->hw = hw;
vq->port_id = dev->data->port_id;
vq->queue_id = queue_idx;
@@ -171,11 +172,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
PMD_INIT_LOG(DEBUG, "vring_size: %d, rounded_vring_size: %d\n", size, 
vq->vq_ring_size);

mz = rte_memzone_reserve_aligned(vq_name, vq->vq_ring_size,
-   socket_id, 0, VIRTIO_PCI_VRING_ALIGN);
+   socket_id, 0, VIRTIO_PCI_VRING_ALIGN);
if (mz == NULL) {
rte_free(vq);
return (-ENOMEM);
}
+
/*
* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
* and only accepts 32 bit page frame number. 
@@ -186,6 +188,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
rte_free(vq);
return (-ENOMEM);
}
+
memset(mz->addr, 0, sizeof(mz->len));
vq->mz = mz;
vq->vq_ring_mem = mz->phys_addr;
@@ -197,8 +200,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,

if (queue_type == VTNET_TQ) {
/* 
-   * For each xmit packet, allocate a virtio_net_hdr
-   */
+* For each xmit packet, allocate a virtio_net_hdr
+*/
rte_snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone",
dev->data->port_id, queue_idx);
vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name,
@@ -206,10 +209,12 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
socket_id, 0, CACHE_LINE_SIZE);
if (vq->virtio_net_hdr_mz == NULL) {
rte_free(vq);
-   return (-ENOMEM);
+   return -ENOMEM;
}
-   vq->virtio_net_hdr_mem = (void 
*)(uintptr_t)vq->virtio_net_hdr_mz->phys_addr;
-   memset(vq->virtio_net_hdr_mz->addr, 0, vq_size * sizeof(struct 
virtio_net_hdr));
+   vq->virtio_net_hdr_mem =
+   (void *)(uintptr_t)vq->virtio_net_hdr_mz->phys_addr;
+   memset(vq->virtio_net_hdr_mz->addr, 0,
+   vq_size * sizeof(struct virtio_net_hdr));
} else if (queue_type == VTNET_CQ) {
/* Allocate a page for control vq command, data and status */
rte_snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone",
@@ -218,9 +223,10 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
PAGE_SIZE, socket_id, 0, CACHE_LINE_SIZE);
if (vq->virtio_net_hdr_mz == NULL) {

[dpdk-dev] [PATCH v4 2/2] virtio: Support multiple queues feature in DPDK based virtio-net frontend

2014-05-29 Thread Ouyang Changchun

This patch supports multiple queues feature in DPDK based virtio-net frontend.
It firstly gets max queue number of virtio-net from virtio PCI configuration and
then send command to negotiate the queue number with backend; When receiving and
transmitting packets, it negotiates multiple virtio-net queues which serve 
RX/TX;
To utilize this feature, the backend also need support multiple queues feature
and enable it.

Signed-off-by: Ouyang Changchun 
Tested-by: Waterman Cao 
This patch passed Testpmd testing base on commit:  
57f0ba5f8b8588dfa6ffcd001447ef6337afa6cd.
See test environment information as the following:
Fedora 20, Linux kernel 3.13.9, GCC 4.8.2 X86_64, Intel Xeon processor E5-2600 
and E5-2600 v2 family
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 309 ++
 lib/librte_pmd_virtio/virtio_ethdev.h |  10 +-
 lib/librte_pmd_virtio/virtio_pci.h|   4 +-
 lib/librte_pmd_virtio/virtio_rxtx.c   |  49 --
 lib/librte_pmd_virtio/virtqueue.h |  35 +++-
 5 files changed, 358 insertions(+), 49 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 685bf90..c2b4dfb 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -81,6 +81,12 @@ static void virtio_dev_stats_get(struct rte_eth_dev *dev, 
struct rte_eth_stats *
 static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);

+static int virtio_dev_queue_stats_mapping_set(
+   __rte_unused struct rte_eth_dev *eth_dev,
+   __rte_unused uint16_t queue_id,
+   __rte_unused uint8_t stat_idx,
+   __rte_unused uint8_t is_rx);
+
 /*
  * The set of PCI devices this driver supports
  */
@@ -92,6 +98,135 @@ static struct rte_pci_id pci_id_virtio_map[] = {
 { .vendor_id = 0, /* sentinel */ },
 };

+static int
+virtio_send_command(struct virtqueue *vq, struct virtio_pmd_ctrl *ctrl,
+   int *dlen, int pkt_num)
+{
+   uint32_t head = vq->vq_desc_head_idx, i;
+   int k, sum = 0;
+   virtio_net_ctrl_ack status = ~0;
+   struct virtio_pmd_ctrl result;
+
+   ctrl->status = status;
+
+   if (!vq->hw->cvq) {
+   PMD_INIT_LOG(ERR, "%s(): Control queue is "
+   "not supported by this device.\n", __func__);
+   return -1;
+   }
+
+   PMD_INIT_LOG(DEBUG, "vq->vq_desc_head_idx = %d, status = %d, "
+   "vq->hw->cvq = %p vq = %p\n",
+   vq->vq_desc_head_idx, status, vq->hw->cvq, vq);
+
+   if ((vq->vq_free_cnt < ((uint32_t)pkt_num + 2)) || (pkt_num < 1))
+   return -1;
+
+   memcpy(vq->virtio_net_hdr_mz->addr, ctrl,
+   sizeof(struct virtio_pmd_ctrl));
+
+   /*
+* Format is enforced in qemu code:
+* One TX packet for header;
+* At least one TX packet per argument;
+* One RX packet for ACK.
+*/
+   vq->vq_ring.desc[head].flags = VRING_DESC_F_NEXT;
+   vq->vq_ring.desc[head].addr = vq->virtio_net_hdr_mz->phys_addr;
+   vq->vq_ring.desc[head].len = sizeof(struct virtio_net_ctrl_hdr);
+   vq->vq_free_cnt--;
+   i = vq->vq_ring.desc[head].next;
+
+   for (k = 0; k < pkt_num; k++) {
+   vq->vq_ring.desc[i].flags = VRING_DESC_F_NEXT;
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+   + sizeof(struct virtio_net_ctrl_hdr)
+   + sizeof(ctrl->status) + sizeof(uint8_t)*sum;
+   vq->vq_ring.desc[i].len = dlen[k];
+   sum += dlen[k];
+   vq->vq_free_cnt--;
+   i = vq->vq_ring.desc[i].next;
+   }
+
+   vq->vq_ring.desc[i].flags = VRING_DESC_F_WRITE;
+   vq->vq_ring.desc[i].addr = vq->virtio_net_hdr_mz->phys_addr
+   + sizeof(struct virtio_net_ctrl_hdr);
+   vq->vq_ring.desc[i].len = sizeof(ctrl->status);
+   vq->vq_free_cnt--;
+
+   vq->vq_desc_head_idx = vq->vq_ring.desc[i].next;
+
+   vq_update_avail_ring(vq, head);
+   vq_update_avail_idx(vq);
+
+   PMD_INIT_LOG(DEBUG, "vq->vq_queue_index = %d\n", vq->vq_queue_index);
+
+   virtqueue_notify(vq);
+
+   while (vq->vq_used_cons_idx == vq->vq_ring.used->idx)
+   usleep(100);
+
+   while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
+   uint32_t idx, desc_idx, used_idx;
+   struct vring_used_elem *uep;
+
+   rmb();
+
+   used_idx = (uint32_t)(vq->vq_used_cons_idx
+   & (vq->vq_nentries - 1));
+   uep = &vq->vq_ring.used->ring[used_idx];
+   idx = (uint32_t) uep->id;
+   desc_idx = idx;
+
+

[dpdk-dev] [PATCH v4 0/2] Support multiple queues feature in DPDK based virtio-net frontend

2014-05-29 Thread Ouyang Changchun

This v4 patch series replace previous v1, v2, v3 patch for virtio-net multiple 
queues feature.
Please apply this v4 patch series and ignore previous patches.

It splits previous one patch into the following 2 patches for easy to review:
Cleanup the existing codes in virtio-net PMD;
Support multiple queues feature in DPDK based virtio-net frontend;

In sum, this patch supports multiple queues feature in DPDK based virtio-net 
frontend.
It firstly gets max queue number of virtio-net from virtio PCI configuration and
then send command to negotiate the queue number with backend; When receiving and
transmitting packets, it negotiates multiple virtio-net queues which serve 
RX/TX;
To utilize this feature, the backend also need support multiple queues feature
and enable it.

Ouyang Changchun (2):
  Cleanup the existing codes in virtio-net PMD.
  Support multiple queues feature in DPDK based virtio-net frontend.

 lib/librte_pmd_virtio/virtio_ethdev.c | 377 --
 lib/librte_pmd_virtio/virtio_ethdev.h |  40 ++--
 lib/librte_pmd_virtio/virtio_pci.h|   4 +-
 lib/librte_pmd_virtio/virtio_rxtx.c   |  92 +++--
 lib/librte_pmd_virtio/virtqueue.h |  61 --
 5 files changed, 458 insertions(+), 116 deletions(-)

-- 
1.9.0

[dpdk-dev] [PATCH] librte_vhost: Fix the path test issue

2014-11-03 Thread Ouyang Changchun

Commit aec8283d47d4e4366b6 fixes the compilation issue, but it leads to 
one runtime issue: early exit wrongly. In some case, 'path' is NULL, but 
'resolved_path' has effective path, it should continue going ahead rather 
than exit.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_vhost/virtio-net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 8015dd8..3fa1274 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -237,7 +237,7 @@ host_memory_map(struct virtio_net *dev, struct 
virtio_memory *mem,
snprintf(memfile, PATH_MAX, "/proc/%u/fd/%s",
pid, dptr->d_name);
path = realpath(memfile, resolved_path);
-   if (path == NULL) {
+   if ((path == NULL) && (strlen(resolved_path) == 0)) {
RTE_LOG(ERR, VHOST_CONFIG,
"(%"PRIu64") Failed to resolve fd directory\n",
dev->device_fh);
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 0/2] Fix packet length issue

2014-11-04 Thread Ouyang Changchun

This patch set fix packet length issue in vhost app, and enhance code by
extracting a function to replace duplicated codes in one copy and zero copy
TX function.

-v3 change:
 Extract a function to replace duplicated codes in one copy and zero copy TX 
function

-v2 change:
 Update data length by plus offset in first segment instead of last segment.

-v1 change:
 Update the packet length by plus offset;
 Use macro to replace constant.

Changchun Ouyang (2):
  Fix packet length issue in vhost.
  Extract a function to replace duplicated codes in vhost.

 examples/vhost/main.c | 137 ++
 1 file changed, 61 insertions(+), 76 deletions(-)

-- 
1.8.4.2

[dpdk-dev] [PATCH v3 2/2] vhost: Remove duplicated codes

2014-11-04 Thread Ouyang Changchun

Extract a function to replace duplicated codes in one copy and zero copy TX 
function.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 139 +-
 1 file changed, 58 insertions(+), 81 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 5ca8dce..2916313 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1040,6 +1040,57 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf 
*m)
 }

 /*
+ * Check if the destination MAC of a packet is one local VM,
+ * and get its vlan tag, and offset if it is.
+ */
+static inline int __attribute__((always_inline))
+find_local_dest(struct virtio_net *dev, struct rte_mbuf *m,
+   uint32_t *offset, uint16_t *vlan_tag)
+{
+   struct virtio_net_data_ll *dev_ll = ll_root_used;
+   struct ether_hdr *pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+
+   while (dev_ll != NULL) {
+   if ((dev_ll->vdev->ready == DEVICE_RX)
+   && ether_addr_cmp(&(pkt_hdr->d_addr),
+   &dev_ll->vdev->mac_address)) {
+   /*
+* Drop the packet if the TX packet is
+* destined for the TX device.
+*/
+   if (dev_ll->vdev->dev->device_fh == dev->device_fh) {
+   LOG_DEBUG(VHOST_DATA,
+   "(%"PRIu64") TX: Source and destination"
+   " MAC addresses are the same. Dropping "
+   "packet.\n",
+   dev_ll->vdev->dev->device_fh);
+   return -1;
+   }
+
+   /*
+* HW vlan strip will reduce the packet length
+* by minus length of vlan tag, so need restore
+* the packet length by plus it.
+*/
+   *offset = VLAN_HLEN;
+   *vlan_tag =
+   (uint16_t)
+   vlan_tags[(uint16_t)dev_ll->vdev->dev->device_fh];
+
+   LOG_DEBUG(VHOST_DATA,
+   "(%"PRIu64") TX: pkt to local VM device id:"
+   "(%"PRIu64") vlan tag: %d.\n",
+   dev->device_fh, dev_ll->vdev->dev->device_fh,
+   vlan_tag);
+
+   break;
+   }
+   dev_ll = dev_ll->next;
+   }
+   return 0;
+}
+
+/*
  * This function routes the TX packet to the correct interface. This may be a 
local device
  * or the physical port.
  */
@@ -1050,8 +1101,6 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
struct rte_mbuf **m_table;
unsigned len, ret, offset = 0;
const uint16_t lcore_id = rte_lcore_id();
-   struct virtio_net_data_ll *dev_ll = ll_root_used;
-   struct ether_hdr *pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
struct virtio_net *dev = vdev->dev;

/*check if destination is local VM*/
@@ -1061,43 +1110,9 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
}

if (vm2vm_mode == VM2VM_HARDWARE) {
-   while (dev_ll != NULL) {
-   if ((dev_ll->vdev->ready == DEVICE_RX)
-   && ether_addr_cmp(&(pkt_hdr->d_addr),
-   &dev_ll->vdev->mac_address)) {
-   /*
-* Drop the packet if the TX packet is
-* destined for the TX device.
-*/
-   if (dev_ll->vdev->dev->device_fh == 
dev->device_fh) {
-   LOG_DEBUG(VHOST_DATA,
-   "(%"PRIu64") TX: Source and destination"
-   " MAC addresses are the same. Dropping "
-   "packet.\n",
-   dev_ll->vdev->dev->device_fh);
-   rte_pktmbuf_free(m);
-   return;
-   }
-
-   /*
-* HW vlan strip will reduce the packet length
-* by minus length of vlan tag, so need restore
-* the packet length by plus it.
-*/
-   offset = VLAN_HLEN;
-   vlan_tag =
-   (uint16_t)
-   
vlan_tags[(uint16_t)dev_ll->vdev->dev->device_fh];
-
-   LOG_DEBUG(VHOST_DATA,
-   "(%"PRIu64") TX: pkt to local VM device id:"
-

[dpdk-dev] [PATCH v3 1/2] vhost: Fix packet length issue

2014-11-04 Thread Ouyang Changchun

As HW vlan strip will reduce the packet length by minus length of vlan tag,
so it need restore the packet length by plus it.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 57ef464..5ca8dce 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1078,7 +1078,13 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
rte_pktmbuf_free(m);
return;
}
-   offset = 4;
+
+   /*
+* HW vlan strip will reduce the packet length
+* by minus length of vlan tag, so need restore
+* the packet length by plus it.
+*/
+   offset = VLAN_HLEN;
vlan_tag =
(uint16_t)

vlan_tags[(uint16_t)dev_ll->vdev->dev->device_fh];
@@ -1102,8 +1108,10 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
len = tx_q->len;

m->ol_flags = PKT_TX_VLAN_PKT;
-   /*FIXME: offset*/
+
m->data_len += offset;
+   m->pkt_len += offset;
+
m->vlan_tci = vlan_tag;

tx_q->m_table[len] = m;
-- 
1.8.4.2

[dpdk-dev] [PATCH v4 1/3] vhost: Fix packet length issue

2014-11-05 Thread Ouyang Changchun

As HW vlan strip will reduce the packet length by minus length of vlan tag,
so it need restore the packet length by plus it.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 57ef464..5ca8dce 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1078,7 +1078,13 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
rte_pktmbuf_free(m);
return;
}
-   offset = 4;
+
+   /*
+* HW vlan strip will reduce the packet length
+* by minus length of vlan tag, so need restore
+* the packet length by plus it.
+*/
+   offset = VLAN_HLEN;
vlan_tag =
(uint16_t)

vlan_tags[(uint16_t)dev_ll->vdev->dev->device_fh];
@@ -1102,8 +1108,10 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
len = tx_q->len;

m->ol_flags = PKT_TX_VLAN_PKT;
-   /*FIXME: offset*/
+
m->data_len += offset;
+   m->pkt_len += offset;
+
m->vlan_tci = vlan_tag;

tx_q->m_table[len] = m;
-- 
1.8.4.2

[dpdk-dev] [PATCH v4 0/3] Fix packet length issue

2014-11-05 Thread Ouyang Changchun

This patch set fix packet length issue in vhost app, and enhance code by
extracting a function to replace duplicated codes in one copy and zero copy
TX function.

-v4 chang:
 Check offset value and extra bytes inside packet buffer cross page boundary.

-v3 change:
 Extract a function to replace duplicated codes in one copy and zero copy TX 
function.

-v2 change:
 Update data length by plus offset in first segment instead of last segment.

-v1 change:
 Update the packet length by plus offset;
 Use macro to replace constant.

Changchun Ouyang (3):
  Fix packet length issue in vhost.
  Extract a function to replace duplicated codes in vhost.
  Check offset value in vhost

 examples/vhost/main.c | 142 +++---
 1 file changed, 65 insertions(+), 77 deletions(-)

-- 
1.8.4.2

[dpdk-dev] [PATCH v4 2/3] vhost: Remove duplicated codes

2014-11-05 Thread Ouyang Changchun

Extract a function to replace duplicated codes in one copy and zero copy TX 
function.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 139 +-
 1 file changed, 58 insertions(+), 81 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 5ca8dce..2916313 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1040,6 +1040,57 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf 
*m)
 }

 /*
+ * Check if the destination MAC of a packet is one local VM,
+ * and get its vlan tag, and offset if it is.
+ */
+static inline int __attribute__((always_inline))
+find_local_dest(struct virtio_net *dev, struct rte_mbuf *m,
+   uint32_t *offset, uint16_t *vlan_tag)
+{
+   struct virtio_net_data_ll *dev_ll = ll_root_used;
+   struct ether_hdr *pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
+
+   while (dev_ll != NULL) {
+   if ((dev_ll->vdev->ready == DEVICE_RX)
+   && ether_addr_cmp(&(pkt_hdr->d_addr),
+   &dev_ll->vdev->mac_address)) {
+   /*
+* Drop the packet if the TX packet is
+* destined for the TX device.
+*/
+   if (dev_ll->vdev->dev->device_fh == dev->device_fh) {
+   LOG_DEBUG(VHOST_DATA,
+   "(%"PRIu64") TX: Source and destination"
+   " MAC addresses are the same. Dropping "
+   "packet.\n",
+   dev_ll->vdev->dev->device_fh);
+   return -1;
+   }
+
+   /*
+* HW vlan strip will reduce the packet length
+* by minus length of vlan tag, so need restore
+* the packet length by plus it.
+*/
+   *offset = VLAN_HLEN;
+   *vlan_tag =
+   (uint16_t)
+   vlan_tags[(uint16_t)dev_ll->vdev->dev->device_fh];
+
+   LOG_DEBUG(VHOST_DATA,
+   "(%"PRIu64") TX: pkt to local VM device id:"
+   "(%"PRIu64") vlan tag: %d.\n",
+   dev->device_fh, dev_ll->vdev->dev->device_fh,
+   vlan_tag);
+
+   break;
+   }
+   dev_ll = dev_ll->next;
+   }
+   return 0;
+}
+
+/*
  * This function routes the TX packet to the correct interface. This may be a 
local device
  * or the physical port.
  */
@@ -1050,8 +1101,6 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
struct rte_mbuf **m_table;
unsigned len, ret, offset = 0;
const uint16_t lcore_id = rte_lcore_id();
-   struct virtio_net_data_ll *dev_ll = ll_root_used;
-   struct ether_hdr *pkt_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
struct virtio_net *dev = vdev->dev;

/*check if destination is local VM*/
@@ -1061,43 +1110,9 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
}

if (vm2vm_mode == VM2VM_HARDWARE) {
-   while (dev_ll != NULL) {
-   if ((dev_ll->vdev->ready == DEVICE_RX)
-   && ether_addr_cmp(&(pkt_hdr->d_addr),
-   &dev_ll->vdev->mac_address)) {
-   /*
-* Drop the packet if the TX packet is
-* destined for the TX device.
-*/
-   if (dev_ll->vdev->dev->device_fh == 
dev->device_fh) {
-   LOG_DEBUG(VHOST_DATA,
-   "(%"PRIu64") TX: Source and destination"
-   " MAC addresses are the same. Dropping "
-   "packet.\n",
-   dev_ll->vdev->dev->device_fh);
-   rte_pktmbuf_free(m);
-   return;
-   }
-
-   /*
-* HW vlan strip will reduce the packet length
-* by minus length of vlan tag, so need restore
-* the packet length by plus it.
-*/
-   offset = VLAN_HLEN;
-   vlan_tag =
-   (uint16_t)
-   
vlan_tags[(uint16_t)dev_ll->vdev->dev->device_fh];
-
-   LOG_DEBUG(VHOST_DATA,
-   "(%"PRIu64") TX: pkt to local VM device id:"
-

[dpdk-dev] [PATCH v4 3/3] vhost: Check offset value

2014-11-05 Thread Ouyang Changchun

This patch checks the packet length offset value, and checks if the extra bytes 
inside buffer
cross page boundary.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 2916313..a93f7a0 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -1110,7 +1110,8 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf 
*m, uint16_t vlan_tag)
}

if (vm2vm_mode == VM2VM_HARDWARE) {
-   if (find_local_dest(dev, m, &offset, &vlan_tag) != 0) {
+   if (find_local_dest(dev, m, &offset, &vlan_tag) != 0 ||
+   offset > rte_pktmbuf_tailroom(m)) {
rte_pktmbuf_free(m);
return;
}
@@ -1896,7 +1897,9 @@ virtio_dev_tx_zcp(struct virtio_net *dev)

/* Buffer address translation. */
buff_addr = gpa_to_vva(dev, desc->addr);
-   phys_addr = gpa_to_hpa(vdev, desc->addr, desc->len, &addr_type);
+   /* Need check extra VLAN_HLEN size for inserting VLAN tag */
+   phys_addr = gpa_to_hpa(vdev, desc->addr, desc->len + VLAN_HLEN,
+   &addr_type);

if (likely(packet_success < (free_entries - 1)))
/* Prefetch descriptor index. */
-- 
1.8.4.2

[dpdk-dev] [PATCH v4 3/3] vhost: Check offset value

2014-11-06 Thread Ouyang, Changchun

Agree with Thomas! using small patches is for easily understanding. 
Merging and mixing things together is not a good thing.

Changchun

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, November 6, 2014 1:01 AM
> To: Xie, Huawei
> Cc: dev at dpdk.org; Ouyang, Changchun
> Subject: Re: [dpdk-dev] [PATCH v4 3/3] vhost: Check offset value
> 
> 2014-11-05 16:52, Xie, Huawei:
> > Why don't we merge 1,2,3 patches?
> 
> Because it's simpler to understand small patches with a dedicated
> explanation in the commit log of each patch.
> Why do you want to merge them?
> 
> --
> Thomas

[dpdk-dev] [PATCH] librte_vhost: Fix the path test issue

2014-11-06 Thread Ouyang, Changchun

Hi Huawei, 
Thanks for the comments,
And my response as follows.

> -Original Message-
> From: Xie, Huawei
> Sent: Thursday, November 6, 2014 10:39 AM
> To: Ouyang, Changchun; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] librte_vhost: Fix the path test issue
> 
> > path = realpath(memfile, resolved_path);
> > -   if (path == NULL) {
> > +   if ((path == NULL) && (strlen(resolved_path) == 0)) {
> > RTE_LOG(ERR, VHOST_CONFIG,
> > "(%"PRIu64") Failed to resolve fd directory\n",
> > dev->device_fh);
> Changchun:
> For some strange file, according to API description, we shouldn't check
> resolved_path as it is undefined.
> To make the loop go on, we could use "continue" when we detect path is
> NULL.
> 
> RETURN VALUE
>If there is no error, realpath() returns a pointer to the 
> resolved_path.
> 
>Otherwise it returns a NULL pointer, and the contents of the array
> resolved_path are undefined, and errno is set to indicate the error.

After my investigation this issue and find out using continue doesn't work.

The reason is procmap.fname itself is 
"/dev/hugepages/qemu_back_mem.pc.ram.zxfqLq",
It is not a normal path, so in this case, path is null, while resolved-path is 
/dev/hugepages/qemu_back_mem.pc.ram.zxfqLq

If 'continue' is used, then procmap.fname could not be hit in the directory 
list,
And then  app will exit after report:?Failed to find memory file for pid

So I have to keep it.

Thanks again
Changchun

[dpdk-dev] [PATCH v4 0/5] Support virtio multicast feature

2014-11-08 Thread Ouyang Changchun

 -V1 change:
This patch series support multicast feature in virtio and vhost.
The vhost backend enables the promiscuous mode and config 
ETH_VMDQ_ACCEPT_BROADCAST
and ETH_VMDQ_ACCEPT_MULTICAST in VMDQ offload register to receive the multicast 
and broadcast packets.
The virtio frontend provides the functionality of enabling and disabling the 
multicast and
promiscuous mode.

 -V2 change:
Rework the patch basing on new vhost library and new vhost application.

 -V3 change:
Rework the patch for comments, split commits.

 -V4 change:
Rework for refining code comment and patch titles, fatorizing codes, and 
resolving conflicts.

Changchun Ouyang (5):
  ethdev: Add vmdq rx mode
  igb: Config VM offload register
  ixgbe: Configure Rx mode for VMDQ
  virtio: Support promiscuous and allmulticast
  vhost: Enable promisc mode and multicast

 examples/vhost/main.c | 24 --
 lib/librte_ether/rte_ethdev.h |  1 +
 lib/librte_pmd_e1000/igb_rxtx.c   | 20 
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c   | 31 
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h   |  1 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c |  6 +++
 lib/librte_pmd_virtio/virtio_ethdev.c | 90 ++-
 lib/librte_vhost/virtio-net.c |  3 +-
 8 files changed, 161 insertions(+), 15 deletions(-)

-- 
1.8.4.2

[dpdk-dev] [PATCH v4 1/5] ethdev: Add vmdq rx mode

2014-11-08 Thread Ouyang Changchun

Add vmdq rx mode field into rx config struct, it is flag from ETH_VMDQ_ACCEPT_*.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_ether/rte_ethdev.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 7e4c998..c29525b 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -593,6 +593,7 @@ struct rte_eth_vmdq_rx_conf {
uint8_t default_pool; /**< The default pool, if applicable */
uint8_t enable_loop_back; /**< Enable VT loop back */
uint8_t nb_pool_maps; /**< We can have up to 64 filters/mappings */
+   uint32_t rx_mode; /**< Flags from ETH_VMDQ_ACCEPT_* */
struct {
uint16_t vlan_id; /**< The vlan id of the received frame */
uint64_t pools;   /**< Bitmask of pools for packet rx */
-- 
1.8.4.2

[dpdk-dev] [PATCH v4 2/5] igb: Config VM offload register

2014-11-08 Thread Ouyang Changchun

Config VM offload register in igb PMD to enable it receive broadcast and 
multicast packets.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_e1000/igb_rxtx.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index f09c525..0dca7b7 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -1779,6 +1779,26 @@ igb_vmdq_rx_hw_configure(struct rte_eth_dev *dev)
vt_ctl |= E1000_VT_CTL_IGNORE_MAC;
E1000_WRITE_REG(hw, E1000_VT_CTL, vt_ctl);

+   for (i = 0; i < E1000_VMOLR_SIZE; i++) {
+   vmolr = E1000_READ_REG(hw, E1000_VMOLR(i));
+   vmolr &= ~(E1000_VMOLR_AUPE | E1000_VMOLR_ROMPE |
+   E1000_VMOLR_ROPE | E1000_VMOLR_BAM |
+   E1000_VMOLR_MPME);
+
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_UNTAG)
+   vmolr |= E1000_VMOLR_AUPE;
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_HASH_MC)
+   vmolr |= E1000_VMOLR_ROMPE;
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_HASH_UC)
+   vmolr |= E1000_VMOLR_ROPE;
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_BROADCAST)
+   vmolr |= E1000_VMOLR_BAM;
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_MULTICAST)
+   vmolr |= E1000_VMOLR_MPME;
+
+   E1000_WRITE_REG(hw, E1000_VMOLR(i), vmolr);
+   }
+
/*
 * VMOLR: set STRVLAN as 1 if IGMAC in VTCTL is set as 1
 * Both 82576 and 82580 support it
-- 
1.8.4.2

[dpdk-dev] [PATCH v4 3/5] ixgbe: Configure Rx mode for VMDQ

2014-11-08 Thread Ouyang Changchun

Config PFVML2FLT register in ixgbe PMD to enable it receive broadcast and 
multicast packets;
also factorize the common logic with ixgbe_set_pool_rx_mode.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 31 +--
 lib/librte_pmd_ixgbe/ixgbe_ethdev.h |  1 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c   |  6 ++
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
index 9c73a30..fb7ed3d 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.c
@@ -3123,6 +3123,26 @@ ixgbe_uc_all_hash_table_set(struct rte_eth_dev *dev, 
uint8_t on)
return 0;

 }
+
+uint32_t
+ixgbe_convert_vm_rx_mask_to_val(uint16_t rx_mask, uint32_t orig_val)
+{
+   uint32_t new_val = orig_val;
+
+   if (rx_mask & ETH_VMDQ_ACCEPT_UNTAG)
+   new_val |= IXGBE_VMOLR_AUPE;
+   if (rx_mask & ETH_VMDQ_ACCEPT_HASH_MC)
+   new_val |= IXGBE_VMOLR_ROMPE;
+   if (rx_mask & ETH_VMDQ_ACCEPT_HASH_UC)
+   new_val |= IXGBE_VMOLR_ROPE;
+   if (rx_mask & ETH_VMDQ_ACCEPT_BROADCAST)
+   new_val |= IXGBE_VMOLR_BAM;
+   if (rx_mask & ETH_VMDQ_ACCEPT_MULTICAST)
+   new_val |= IXGBE_VMOLR_MPE;
+
+   return new_val;
+}
+
 static int
 ixgbe_set_pool_rx_mode(struct rte_eth_dev *dev, uint16_t pool,
   uint16_t rx_mask, uint8_t on)
@@ -3141,16 +3161,7 @@ ixgbe_set_pool_rx_mode(struct rte_eth_dev *dev, uint16_t 
pool,
if (ixgbe_vmdq_mode_check(hw) < 0)
return (-ENOTSUP);

-   if (rx_mask & ETH_VMDQ_ACCEPT_UNTAG )
-   val |= IXGBE_VMOLR_AUPE;
-   if (rx_mask & ETH_VMDQ_ACCEPT_HASH_MC )
-   val |= IXGBE_VMOLR_ROMPE;
-   if (rx_mask & ETH_VMDQ_ACCEPT_HASH_UC)
-   val |= IXGBE_VMOLR_ROPE;
-   if (rx_mask & ETH_VMDQ_ACCEPT_BROADCAST)
-   val |= IXGBE_VMOLR_BAM;
-   if (rx_mask & ETH_VMDQ_ACCEPT_MULTICAST)
-   val |= IXGBE_VMOLR_MPE;
+   val = ixgbe_convert_vm_rx_mask_to_val(rx_mask, val);

if (on)
vmolr |= val;
diff --git a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h 
b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
index a5159e5..ca99170 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
+++ b/lib/librte_pmd_ixgbe/ixgbe_ethdev.h
@@ -340,4 +340,5 @@ void ixgbe_pf_mbx_process(struct rte_eth_dev *eth_dev);

 int ixgbe_pf_host_configure(struct rte_eth_dev *eth_dev);

+uint32_t ixgbe_convert_vm_rx_mask_to_val(uint16_t rx_mask, uint32_t orig_val);
 #endif /* _IXGBE_ETHDEV_H_ */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index 3a5a8ff..f9b3fe3 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -3123,6 +3123,7 @@ ixgbe_vmdq_rx_hw_configure(struct rte_eth_dev *dev)
struct ixgbe_hw *hw;
enum rte_eth_nb_pools num_pools;
uint32_t mrqc, vt_ctl, vlanctrl;
+   uint32_t vmolr = 0;
int i;

PMD_INIT_FUNC_TRACE();
@@ -3145,6 +3146,11 @@ ixgbe_vmdq_rx_hw_configure(struct rte_eth_dev *dev)

IXGBE_WRITE_REG(hw, IXGBE_VT_CTL, vt_ctl);

+   for (i = 0; i < (int)num_pools; i++) {
+   vmolr = ixgbe_convert_vm_rx_mask_to_val(cfg->rx_mode, vmolr);
+   IXGBE_WRITE_REG(hw, IXGBE_VMOLR(i), vmolr);
+   }
+
/* VLNCTRL: enable vlan filtering and allow all vlan tags through */
vlanctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
vlanctrl |= IXGBE_VLNCTRL_VFE ; /* enable vlan filters */
-- 
1.8.4.2

[dpdk-dev] [PATCH v4 5/5] vhost: Enable promisc mode and multicast

2014-11-08 Thread Ouyang Changchun

This is to enable user space vhost receiving and forwarding broadcast and 
multicast packets: Use new option in command line to enable promisc mode; 
Enable 2 bits in VMDQ RX mode: ETH_VMDQ_ACCEPT_BROADCAST and 
ETH_VMDQ_ACCEPT_MULTICAST.

Signed-off-by: Changchun Ouyang 
---
 examples/vhost/main.c | 24 +---
 lib/librte_vhost/virtio-net.c |  3 ++-
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index a93f7a0..1f1edbe 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -161,6 +161,9 @@
 /* mask of enabled ports */
 static uint32_t enabled_port_mask = 0;

+/* Promiscuous mode */
+static uint32_t promiscuous;
+
 /*Number of switching cores enabled*/
 static uint32_t num_switching_cores = 0;

@@ -364,13 +367,15 @@ static inline int
 get_eth_conf(struct rte_eth_conf *eth_conf, uint32_t num_devices)
 {
struct rte_eth_vmdq_rx_conf conf;
+   struct rte_eth_vmdq_rx_conf *def_conf =
+   &vmdq_conf_default.rx_adv_conf.vmdq_rx_conf;
unsigned i;

memset(&conf, 0, sizeof(conf));
conf.nb_queue_pools = (enum rte_eth_nb_pools)num_devices;
conf.nb_pool_maps = num_devices;
-   conf.enable_loop_back =
-   vmdq_conf_default.rx_adv_conf.vmdq_rx_conf.enable_loop_back;
+   conf.enable_loop_back = def_conf->enable_loop_back;
+   conf.rx_mode = def_conf->rx_mode;

for (i = 0; i < conf.nb_pool_maps; i++) {
conf.pool_map[i].vlan_id = vlan_tags[ i ];
@@ -468,6 +473,9 @@ port_init(uint8_t port)
return retval;
}

+   if (promiscuous)
+   rte_eth_promiscuous_enable(port);
+
rte_eth_macaddr_get(port, &vmdq_ports_eth_addr[port]);
RTE_LOG(INFO, VHOST_PORT, "Max virtio devices supported: %u\n", 
num_devices);
RTE_LOG(INFO, VHOST_PORT, "Port %u MAC: %02"PRIx8" %02"PRIx8" %02"PRIx8
@@ -598,7 +606,8 @@ us_vhost_parse_args(int argc, char **argv)
};

/* Parse command line */
-   while ((opt = getopt_long(argc, argv, "p:",long_option, &option_index)) 
!= EOF) {
+   while ((opt = getopt_long(argc, argv, "p:P",
+   long_option, &option_index)) != EOF) {
switch (opt) {
/* Portmask */
case 'p':
@@ -610,6 +619,15 @@ us_vhost_parse_args(int argc, char **argv)
}
break;

+   case 'P':
+   promiscuous = 1;
+   vmdq_conf_default.rx_adv_conf.vmdq_rx_conf.rx_mode =
+   ETH_VMDQ_ACCEPT_BROADCAST |
+   ETH_VMDQ_ACCEPT_MULTICAST;
+   rte_vhost_feature_enable(1ULL << VIRTIO_NET_F_CTRL_RX);
+
+   break;
+
case 0:
/* Enable/disable vm2vm comms. */
if (!strncmp(long_option[option_index].name, "vm2vm",
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 6d8de09..852b6d1 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -68,7 +68,8 @@ static struct virtio_net_device_ops const *notify_ops;
 static struct virtio_net_config_ll *ll_root;

 /* Features supported by this lib. */
-#define VHOST_SUPPORTED_FEATURES (1ULL << VIRTIO_NET_F_MRG_RXBUF)
+#define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
+ (1ULL << VIRTIO_NET_F_CTRL_RX))
 static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;

 /* Line size for reading maps file. */
-- 
1.8.4.2

[dpdk-dev] [PATCH v4 4/5] virtio: Support promiscuous and allmulticast

2014-11-08 Thread Ouyang Changchun

Add codes for supporting promiscuous and allmulticast enable and disable.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 90 ++-
 1 file changed, 89 insertions(+), 1 deletion(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 19930c0..c009f2a 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -66,6 +66,10 @@ static int eth_virtio_dev_init(struct eth_driver *eth_drv,
 static int  virtio_dev_configure(struct rte_eth_dev *dev);
 static int  virtio_dev_start(struct rte_eth_dev *dev);
 static void virtio_dev_stop(struct rte_eth_dev *dev);
+static void virtio_dev_promiscuous_enable(struct rte_eth_dev *dev);
+static void virtio_dev_promiscuous_disable(struct rte_eth_dev *dev);
+static void virtio_dev_allmulticast_enable(struct rte_eth_dev *dev);
+static void virtio_dev_allmulticast_disable(struct rte_eth_dev *dev);
 static void virtio_dev_info_get(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info);
 static int virtio_dev_link_update(struct rte_eth_dev *dev,
@@ -403,6 +407,86 @@ virtio_dev_close(struct rte_eth_dev *dev)
virtio_dev_stop(dev);
 }

+static void
+virtio_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+   struct virtio_hw *hw
+   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_pmd_ctrl ctrl;
+   int dlen[1];
+   int ret;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
+   ctrl.data[0] = 1;
+   dlen[0] = 1;
+
+   ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+   if (ret)
+   PMD_INIT_LOG(ERR, "Failed to enable promisc");
+}
+
+static void
+virtio_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+   struct virtio_hw *hw
+   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_pmd_ctrl ctrl;
+   int dlen[1];
+   int ret;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
+   ctrl.data[0] = 0;
+   dlen[0] = 1;
+
+   ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+   if (ret)
+   PMD_INIT_LOG(ERR, "Failed to disable promisc");
+}
+
+static void
+virtio_dev_allmulticast_enable(struct rte_eth_dev *dev)
+{
+   struct virtio_hw *hw
+   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_pmd_ctrl ctrl;
+   int dlen[1];
+   int ret;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
+   ctrl.data[0] = 1;
+   dlen[0] = 1;
+
+   ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+   if (ret)
+   PMD_INIT_LOG(ERR, "Failed to enable allmulticast");
+}
+
+static void
+virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
+{
+   struct virtio_hw *hw
+   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_pmd_ctrl ctrl;
+   int dlen[1];
+   int ret;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
+   ctrl.data[0] = 0;
+   dlen[0] = 1;
+
+   ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+   if (ret)
+   PMD_INIT_LOG(ERR, "Failed to disable allmulticast");
+}
+
 /*
  * dev_ops for virtio, bare necessities for basic operation
  */
@@ -411,6 +495,10 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.dev_start   = virtio_dev_start,
.dev_stop= virtio_dev_stop,
.dev_close   = virtio_dev_close,
+   .promiscuous_enable  = virtio_dev_promiscuous_enable,
+   .promiscuous_disable = virtio_dev_promiscuous_disable,
+   .allmulticast_enable = virtio_dev_allmulticast_enable,
+   .allmulticast_disable= virtio_dev_allmulticast_disable,

.dev_infos_get   = virtio_dev_info_get,
.stats_get   = virtio_dev_stats_get,
@@ -561,7 +649,7 @@ virtio_negotiate_features(struct virtio_hw *hw)
 {
uint32_t host_features, mask;

-   mask = VIRTIO_NET_F_CTRL_RX | VIRTIO_NET_F_CTRL_VLAN;
+   mask = VIRTIO_NET_F_CTRL_VLAN;
mask |= VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;

/* TSO and LRO are only available when their corresponding
-- 
1.8.4.2

[dpdk-dev] [PATCH v3 1/5] ethdev: add vmdq rx mode

2014-11-08 Thread Ouyang, Changchun

Hi Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Thursday, November 6, 2014 9:56 PM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 1/5] ethdev: add vmdq rx mode
> 
> 2014-10-31 13:19, Ouyang Changchun:
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -577,6 +577,7 @@ struct rte_eth_vmdq_rx_conf {
> > uint8_t default_pool; /**< The default pool, if applicable */
> > uint8_t enable_loop_back; /**< Enable VT loop back */
> > uint8_t nb_pool_maps; /**< We can have up to 64 filters/mappings
> */
> > +   uint32_t rx_mode; /**< RX mode for vmdq */
> 
> You are adding the field rx_mode in struct rte_eth_vmdq_rx_conf.
> So the comment "RX mode for vmdq" is not really informative :) It would be
> more interesting to explain which kind of value this field must contain.
> Something like "flags from ETH_VMDQ_ACCEPT_*".
> 
Thanks for your comments, I will update it.

Changchun

[dpdk-dev] [PATCH v4 0/5] Support virtio multicast feature

2014-11-12 Thread Ouyang, Changchun

Hi Thomas,

Thanks very much for applying this patch!

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, November 12, 2014 7:17 AM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org; Xie, Huawei
> Subject: Re: [dpdk-dev] [PATCH v4 0/5] Support virtio multicast feature
> 
> >  -V1 change:
> > This patch series support multicast feature in virtio and vhost.
> > The vhost backend enables the promiscuous mode and config
> > ETH_VMDQ_ACCEPT_BROADCAST and ETH_VMDQ_ACCEPT_MULTICAST in
> VMDQ offload register to receive the multicast and broadcast packets.
> > The virtio frontend provides the functionality of enabling and
> > disabling the multicast and promiscuous mode.
> >
> >  -V2 change:
> > Rework the patch basing on new vhost library and new vhost application.
> >
> >  -V3 change:
> > Rework the patch for comments, split commits.
> >
> >  -V4 change:
> > Rework for refining code comment and patch titles, fatorizing codes, and
> resolving conflicts.
> >
> > Changchun Ouyang (5):
> >   ethdev: Add vmdq rx mode
> >   igb: Config VM offload register
> >   ixgbe: Configure Rx mode for VMDQ
> >   virtio: Support promiscuous and allmulticast
> >   vhost: Enable promisc mode and multicast
> 
> I reviewed only the first 3 commits.
> The virtio and vhost commits seem to have been reviewed by Huawei.
> Next times, a clear acked-by would be preferable.
> Please note that this is the role of developpers to request reviews when
> needed. Reviews are not always spontaneous :)
> 
Yes I have asked some guys more than 3 times to review and ack this patch,
Just because each guy has tightly schedule on their own patch rework, doc 
writing, next feature planning, etc,
So the patch-acking delays..

Thanks again and regards,
Changchun

[dpdk-dev] One pkt in mbuf chain - virtio pmd driver

2014-08-07 Thread Ouyang, Changchun

It is one feature in developing,
scatter and mergeable RX will be support in virtio subsequent patch.
Thanks 
Changchun


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xie, Huawei
> Sent: Thursday, August 7, 2014 3:07 PM
> To: Czaus, Tomasz; dev at dpdk.org
> Subject: Re: [dpdk-dev] One pkt in mbuf chain - virtio pmd driver
> 
> Hi Tomasz:
> This is a known issue in user space vhost. Will be fixed in subsequent patch
> once the vhost lib is applied.
> 
> BR.
> -Huawei
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Czaus, Tomasz
> > Sent: Thursday, August 07, 2014 2:20 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] One pkt in mbuf chain - virtio pmd driver
> >
> > Hello,
> >
> > Does virtio pmd driver support scenario when a frame fits in mbuf
> > chain, this means all headers (eth/ipv4/tcp) are located in first mbuf
> > and user data is located in next mbuf. I have asked the same question
> > on dpdk-ovs mailing group, here is a thread and more details:
> >
> > https://lists.01.org/pipermail/dpdk-ovs/2014-August/001557.html
> >
> > Best Regards,
> > Tomasz Czaus
> >
> > 
> >
> > Intel Technology Poland sp. z o.o.
> > ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII
> > Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP
> > 957-07-
> > 52-316 | Kapital zakladowy 200.000 PLN.
> >
> > Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego
> > adresata i moze zawierac informacje poufne. W razie przypadkowego
> > otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale
> > jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest
> > zabronione.
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). If you are not the intended
> > recipient, please contact the sender and delete all copies; any review
> > or distribution by others is strictly prohibited.

[dpdk-dev] [PATCH v3] virtio: Support mergeable buffer in virtio pmd

2014-08-14 Thread Ouyang Changchun

v3 change:
- Investigate the comments from Huawei and fix one potential issue of wrong 
offset to
  the number of descriptor in buffer; also fix other tiny comments.

v2 change:
- Resolve conflicts with the tip code;
- And resolve 2 issues:
   -- fix mbuf leak when discard an uncompleted packet.
   -- refine pkt.data to point to actual payload data start point.

v1 change:
- This patch supports mergeable buffer feature in DPDK based virtio PMD, which 
can
  receive jumbo frame with larger size, like 3K, 4K or even 9K.

Signed-off-by: Changchun Ouyang 
Acked-by: Huawei Xie 
---
 lib/librte_pmd_virtio/virtio_ethdev.c |  20 +--
 lib/librte_pmd_virtio/virtio_ethdev.h |   3 +
 lib/librte_pmd_virtio/virtio_rxtx.c   | 221 +-
 3 files changed, 207 insertions(+), 37 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index b9f5529..535d798 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -337,7 +337,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
snprintf(vq_name, sizeof(vq_name), "port%d_tvq%d_hdrzone",
dev->data->port_id, queue_idx);
vq->virtio_net_hdr_mz = rte_memzone_reserve_aligned(vq_name,
-   vq_size * sizeof(struct virtio_net_hdr),
+   vq_size * hw->vtnet_hdr_size,
socket_id, 0, CACHE_LINE_SIZE);
if (vq->virtio_net_hdr_mz == NULL) {
rte_free(vq);
@@ -346,7 +346,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->virtio_net_hdr_mem =
vq->virtio_net_hdr_mz->phys_addr;
memset(vq->virtio_net_hdr_mz->addr, 0,
-   vq_size * sizeof(struct virtio_net_hdr));
+   vq_size * hw->vtnet_hdr_size);
} else if (queue_type == VTNET_CQ) {
/* Allocate a page for control vq command, data and status */
snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone",
@@ -571,9 +571,6 @@ virtio_negotiate_features(struct virtio_hw *hw)
mask |= VIRTIO_NET_F_GUEST_TSO4 | VIRTIO_NET_F_GUEST_TSO6 | 
VIRTIO_NET_F_GUEST_ECN;
mask |= VTNET_LRO_FEATURES;

-   /* rx_mbuf should not be in multiple merged segments */
-   mask |= VIRTIO_NET_F_MRG_RXBUF;
-
/* not negotiating INDIRECT descriptor table support */
mask |= VIRTIO_RING_F_INDIRECT_DESC;

@@ -746,7 +743,6 @@ eth_virtio_dev_init(__rte_unused struct eth_driver *eth_drv,
}

eth_dev->dev_ops = &virtio_eth_dev_ops;
-   eth_dev->rx_pkt_burst = &virtio_recv_pkts;
eth_dev->tx_pkt_burst = &virtio_xmit_pkts;

if (rte_eal_process_type() == RTE_PROC_SECONDARY)
@@ -801,10 +797,13 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
virtio_negotiate_features(hw);

/* Setting up rx_header size for the device */
-   if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
+   if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) {
+   eth_dev->rx_pkt_burst = &virtio_recv_mergeable_pkts;
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
-   else
+   } else {
+   eth_dev->rx_pkt_burst = &virtio_recv_pkts;
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
+   }

/* Allocate memory for storing MAC addresses */
eth_dev->data->mac_addrs = rte_zmalloc("virtio", ETHER_ADDR_LEN, 0);
@@ -1009,7 +1008,7 @@ static void virtio_dev_free_mbufs(struct rte_eth_dev *dev)

while ((buf = (struct rte_mbuf *)virtqueue_detatch_unused(
dev->data->rx_queues[i])) != NULL) {
-   rte_pktmbuf_free_seg(buf);
+   rte_pktmbuf_free(buf);
mbuf_num++;
}

@@ -1028,7 +1027,8 @@ static void virtio_dev_free_mbufs(struct rte_eth_dev *dev)
mbuf_num = 0;
while ((buf = (struct rte_mbuf *)virtqueue_detatch_unused(
dev->data->tx_queues[i])) != NULL) {
-   rte_pktmbuf_free_seg(buf);
+   rte_pktmbuf_free(buf);
+
mbuf_num++;
}

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.h 
b/lib/librte_pmd_virtio/virtio_ethdev.h
index 858e644..d2e1eed 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.h
+++ b/lib/librte_pmd_virtio/virtio_ethdev.h
@@ -104,6 +104,9 @@ int  virtio_dev_tx_queue_setup(struct rte_eth_dev *dev, 
uint16_t tx_queue_id,
 uint16_t virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts);

+uint16_t virtio_recv_mergeable_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
+   uint16_t nb_pkts);
+
 uint16_t virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,

[dpdk-dev] [PATCH] examples/vhost: Support jumbo frame in user space vhost

2014-08-15 Thread Ouyang Changchun

This patch support mergeable RX feature and thus support jumbo frame RX and TX
in user space vhost(as virtio backend).

On RX, it secures enough room from vring to accommodate one complete scattered
packet which is received by PMD from physical port, and then copy data from
mbuf to vring buffer, possibly across a few vring entries and descriptors.

On TX, it gets a jumbo frame, possibly described by a few vring descriptors 
which
are chained together with the flags of 'NEXT', and then copy them into one 
scattered
packet and TX it to physical port through PMD.

Signed-off-by: Changchun Ouyang 
Acked-by: Huawei Xie 
---
 examples/vhost/main.c   | 726 
 examples/vhost/virtio-net.h |  14 +
 2 files changed, 687 insertions(+), 53 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 193aa25..7d9e6a2 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -106,6 +106,8 @@
 #define BURST_RX_WAIT_US 15/* Defines how long we wait between retries on 
RX */
 #define BURST_RX_RETRIES 4 /* Number of retries on RX. */

+#define JUMBO_FRAME_MAX_SIZE0x2600
+
 /* State of virtio device. */
 #define DEVICE_MAC_LEARNING 0
 #define DEVICE_RX  1
@@ -676,8 +678,12 @@ us_vhost_parse_args(int argc, char **argv)
us_vhost_usage(prgname);
return -1;
} else {
-   if (ret)
+   if (ret) {
+   
vmdq_conf_default.rxmode.jumbo_frame = 1;
+   
vmdq_conf_default.rxmode.max_rx_pkt_len
+   = JUMBO_FRAME_MAX_SIZE;
VHOST_FEATURES = (1ULL << 
VIRTIO_NET_F_MRG_RXBUF);
+   }
}
}

@@ -797,6 +803,14 @@ us_vhost_parse_args(int argc, char **argv)
return -1;
}

+   if ((zero_copy == 1) && (vmdq_conf_default.rxmode.jumbo_frame == 1)) {
+   RTE_LOG(INFO, VHOST_PORT,
+   "Vhost zero copy doesn't support jumbo frame,"
+   "please specify '--mergeable 0' to disable the "
+   "mergeable feature.\n");
+   return -1;
+   }
+
return 0;
 }

@@ -916,7 +930,7 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t guest_pa,
  * This function adds buffers to the virtio devices RX virtqueue. Buffers can
  * be received from the physical port or from another virtio device. A packet
  * count is returned to indicate the number of packets that were succesfully
- * added to the RX queue.
+ * added to the RX queue. This function works when mergeable is disabled.
  */
 static inline uint32_t __attribute__((always_inline))
 virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf **pkts, uint32_t count)
@@ -930,7 +944,6 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf 
**pkts, uint32_t count)
uint64_t buff_hdr_addr = 0;
uint32_t head[MAX_PKT_BURST], packet_len = 0;
uint32_t head_idx, packet_success = 0;
-   uint32_t mergeable, mrg_count = 0;
uint32_t retry = 0;
uint16_t avail_idx, res_cur_idx;
uint16_t res_base_idx, res_end_idx;
@@ -940,6 +953,7 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf 
**pkts, uint32_t count)
LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh);
vq = dev->virtqueue[VIRTIO_RXQ];
count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;
+
/* As many data cores may want access to available buffers, they need 
to be reserved. */
do {
res_base_idx = vq->last_used_idx_res;
@@ -976,9 +990,6 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf 
**pkts, uint32_t count)
/* Prefetch available ring to retrieve indexes. */
rte_prefetch0(&vq->avail->ring[res_cur_idx & (vq->size - 1)]);

-   /* Check if the VIRTIO_NET_F_MRG_RXBUF feature is enabled. */
-   mergeable = dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF);
-
/* Retrieve all of the head indexes first to avoid caching issues. */
for (head_idx = 0; head_idx < count; head_idx++)
head[head_idx] = vq->avail->ring[(res_cur_idx + head_idx) & 
(vq->size - 1)];
@@ -997,56 +1008,44 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf 
**pkts, uint32_t count)
/* Prefetch buffer address. */
rte_prefetch0((void*)(uintptr_t)buff_addr);

-   if (mergeable && (mrg_count != 0)) {
-   desc->len = packet_len = rte_pktmbuf_data_len(buff);
-   } else {
-   /* Copy virtio_hdr to packet and increment buffer 
address */
-   buf

[dpdk-dev] [PATCH v3] virtio: Support mergeable buffer in virtio pmd

2014-08-21 Thread Ouyang, Changchun

Hi all,

Any comments for this patch?
And what's the status for merging it into mainline?

Thanks in advance
Changchun

> -Original Message-
> From: Ouyang, Changchun
> Sent: Thursday, August 14, 2014 4:55 PM
> To: dev at dpdk.org
> Cc: Cao, Waterman; Ouyang, Changchun
> Subject: [PATCH v3] virtio: Support mergeable buffer in virtio pmd
> 
> v3 change:
> - Investigate the comments from Huawei and fix one potential issue of
> wrong offset to
>   the number of descriptor in buffer; also fix other tiny comments.
> 
> v2 change:
> - Resolve conflicts with the tip code;
> - And resolve 2 issues:
>-- fix mbuf leak when discard an uncompleted packet.
>-- refine pkt.data to point to actual payload data start point.
> 
> v1 change:
> - This patch supports mergeable buffer feature in DPDK based virtio PMD,
> which can
>   receive jumbo frame with larger size, like 3K, 4K or even 9K.
> 
> Signed-off-by: Changchun Ouyang 
> Acked-by: Huawei Xie 
> ---
>  lib/librte_pmd_virtio/virtio_ethdev.c |  20 +--
>  lib/librte_pmd_virtio/virtio_ethdev.h |   3 +
>  lib/librte_pmd_virtio/virtio_rxtx.c   | 221
> +-
>  3 files changed, 207 insertions(+), 37 deletions(-)
> 
> diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c
> b/lib/librte_pmd_virtio/virtio_ethdev.c
> index b9f5529..535d798 100644
> --- a/lib/librte_pmd_virtio/virtio_ethdev.c
> +++ b/lib/librte_pmd_virtio/virtio_ethdev.c
> @@ -337,7 +337,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
>   snprintf(vq_name, sizeof(vq_name),
> "port%d_tvq%d_hdrzone",
>   dev->data->port_id, queue_idx);
>   vq->virtio_net_hdr_mz =
> rte_memzone_reserve_aligned(vq_name,
> - vq_size * sizeof(struct virtio_net_hdr),
> + vq_size * hw->vtnet_hdr_size,
>   socket_id, 0, CACHE_LINE_SIZE);
>   if (vq->virtio_net_hdr_mz == NULL) {
>   rte_free(vq);
> @@ -346,7 +346,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
>   vq->virtio_net_hdr_mem =
>   vq->virtio_net_hdr_mz->phys_addr;
>   memset(vq->virtio_net_hdr_mz->addr, 0,
> - vq_size * sizeof(struct virtio_net_hdr));
> + vq_size * hw->vtnet_hdr_size);
>   } else if (queue_type == VTNET_CQ) {
>   /* Allocate a page for control vq command, data and status
> */
>   snprintf(vq_name, sizeof(vq_name), "port%d_cvq_hdrzone",
> @@ -571,9 +571,6 @@ virtio_negotiate_features(struct virtio_hw *hw)
>   mask |= VIRTIO_NET_F_GUEST_TSO4 | VIRTIO_NET_F_GUEST_TSO6
> | VIRTIO_NET_F_GUEST_ECN;
>   mask |= VTNET_LRO_FEATURES;
> 
> - /* rx_mbuf should not be in multiple merged segments */
> - mask |= VIRTIO_NET_F_MRG_RXBUF;
> -
>   /* not negotiating INDIRECT descriptor table support */
>   mask |= VIRTIO_RING_F_INDIRECT_DESC;
> 
> @@ -746,7 +743,6 @@ eth_virtio_dev_init(__rte_unused struct eth_driver
> *eth_drv,
>   }
> 
>   eth_dev->dev_ops = &virtio_eth_dev_ops;
> - eth_dev->rx_pkt_burst = &virtio_recv_pkts;
>   eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
> 
>   if (rte_eal_process_type() == RTE_PROC_SECONDARY)
> @@ -801,10 +797,13 @@ eth_virtio_dev_init(__rte_unused struct
> eth_driver *eth_drv,
>   virtio_negotiate_features(hw);
> 
>   /* Setting up rx_header size for the device */
> - if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF))
> + if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) {
> + eth_dev->rx_pkt_burst = &virtio_recv_mergeable_pkts;
>   hw->vtnet_hdr_size = sizeof(struct
> virtio_net_hdr_mrg_rxbuf);
> - else
> + } else {
> + eth_dev->rx_pkt_burst = &virtio_recv_pkts;
>   hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
> + }
> 
>   /* Allocate memory for storing MAC addresses */
>   eth_dev->data->mac_addrs = rte_zmalloc("virtio",
> ETHER_ADDR_LEN, 0);
> @@ -1009,7 +1008,7 @@ static void virtio_dev_free_mbufs(struct
> rte_eth_dev *dev)
> 
>   while ((buf = (struct rte_mbuf *)virtqueue_detatch_unused(
>   dev->data->rx_queues[i])) != NULL) {
> - rte_pktmbuf_free_seg(buf);
> + rte_pktmbuf_free(buf);
>   mbuf_num++;
>   }
> 
> @@ -1028,7 +1027,8 @@ static void virtio_dev_free_mbufs(struct
> rte_eth_dev *dev)
>   mbuf_num = 0

[dpdk-dev] [PATCH] examples/vhost: Support jumbo frame in user space vhost

2014-08-21 Thread Ouyang, Changchun

Hi all,

Any comments for this patch?
And what's the status for merging it into mainline?

Thanks in advance
Changchun

> -Original Message-
> From: Ouyang, Changchun
> Sent: Friday, August 15, 2014 12:58 PM
> To: dev at dpdk.org
> Cc: Cao, Waterman; Ouyang, Changchun
> Subject: [PATCH] examples/vhost: Support jumbo frame in user space vhost
> 
> This patch support mergeable RX feature and thus support jumbo frame RX
> and TX in user space vhost(as virtio backend).
> 
> On RX, it secures enough room from vring to accommodate one complete
> scattered packet which is received by PMD from physical port, and then copy
> data from mbuf to vring buffer, possibly across a few vring entries and
> descriptors.
> 
> On TX, it gets a jumbo frame, possibly described by a few vring descriptors
> which are chained together with the flags of 'NEXT', and then copy them into
> one scattered packet and TX it to physical port through PMD.
> 
> Signed-off-by: Changchun Ouyang 
> Acked-by: Huawei Xie 
> ---
>  examples/vhost/main.c   | 726
> 
>  examples/vhost/virtio-net.h |  14 +
>  2 files changed, 687 insertions(+), 53 deletions(-)
> 
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c index
> 193aa25..7d9e6a2 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -106,6 +106,8 @@
>  #define BURST_RX_WAIT_US 15  /* Defines how long we wait
> between retries on RX */
>  #define BURST_RX_RETRIES 4   /* Number of retries on RX. */
> 
> +#define JUMBO_FRAME_MAX_SIZE0x2600
> +
>  /* State of virtio device. */
>  #define DEVICE_MAC_LEARNING 0
>  #define DEVICE_RX1
> @@ -676,8 +678,12 @@ us_vhost_parse_args(int argc, char **argv)
>   us_vhost_usage(prgname);
>   return -1;
>   } else {
> - if (ret)
> + if (ret) {
> +
>   vmdq_conf_default.rxmode.jumbo_frame = 1;
> +
>   vmdq_conf_default.rxmode.max_rx_pkt_len
> + =
> JUMBO_FRAME_MAX_SIZE;
>   VHOST_FEATURES = (1ULL <<
> VIRTIO_NET_F_MRG_RXBUF);
> + }
>   }
>   }
> 
> @@ -797,6 +803,14 @@ us_vhost_parse_args(int argc, char **argv)
>   return -1;
>   }
> 
> + if ((zero_copy == 1) && (vmdq_conf_default.rxmode.jumbo_frame
> == 1)) {
> + RTE_LOG(INFO, VHOST_PORT,
> + "Vhost zero copy doesn't support jumbo frame,"
> + "please specify '--mergeable 0' to disable the "
> + "mergeable feature.\n");
> + return -1;
> + }
> +
>   return 0;
>  }
> 
> @@ -916,7 +930,7 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t guest_pa,
>   * This function adds buffers to the virtio devices RX virtqueue. Buffers can
>   * be received from the physical port or from another virtio device. A packet
>   * count is returned to indicate the number of packets that were succesfully
> - * added to the RX queue.
> + * added to the RX queue. This function works when mergeable is disabled.
>   */
>  static inline uint32_t __attribute__((always_inline))  virtio_dev_rx(struct
> virtio_net *dev, struct rte_mbuf **pkts, uint32_t count) @@ -930,7 +944,6
> @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf **pkts, uint32_t
> count)
>   uint64_t buff_hdr_addr = 0;
>   uint32_t head[MAX_PKT_BURST], packet_len = 0;
>   uint32_t head_idx, packet_success = 0;
> - uint32_t mergeable, mrg_count = 0;
>   uint32_t retry = 0;
>   uint16_t avail_idx, res_cur_idx;
>   uint16_t res_base_idx, res_end_idx;
> @@ -940,6 +953,7 @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf
> **pkts, uint32_t count)
>   LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev-
> >device_fh);
>   vq = dev->virtqueue[VIRTIO_RXQ];
>   count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count;
> +
>   /* As many data cores may want access to available buffers, they
> need to be reserved. */
>   do {
>   res_base_idx = vq->last_used_idx_res; @@ -976,9 +990,6
> @@ virtio_dev_rx(struct virtio_net *dev, struct rte_mbuf **pkts, uint32_t
> count)
>   /* Prefetch available ring to retrieve indexes. */
>   rte_prefetch0(&vq->avail->ring[res_cur_idx & (vq->size - 1)]);
>

[dpdk-dev] [PATCH 0/5] Support virtio multicast feature

2014-08-25 Thread Ouyang Changchun

This patch series support multicast feature in virtio and vhost.
The vhost backend enables the promiscuous mode and config 
ETH_VMDQ_ACCEPT_BROADCAST
and ETH_VMDQ_ACCEPT_MULTICAST in VMDQ offload register to receive the multicast 
and broadcast packets.
The virtio frontend provides the functionality of enabling and disabling the 
multicast and
promiscuous mode.

Changchun Ouyang (2):
  Set VM offload register according to VMDQ config for IGB PMD to
support broadcast and multicast packets.
  Add new API in virtio for supporting promiscuous and allmulticast
enable and disable.

Ouyang Changchun (3):
  Add RX mode in VMDQ config and set the register PFVML2FLT for IXGBE
PMD; this makes VMDQ accept broadcast and multicast packets.
  To let US-vHOST accept and forward broadcast and multicast packets:
Add promiscurous option into command line; set VMDQ RX mode into:
ETH_VMDQ_ACCEPT_BROADCAST|ETH_VMDQ_ACCEPT_MULTICAST.
  Specify rx_mode as 0 for 2 other samples: vmdq and vhost-xen.

 examples/vhost/main.c | 27 --
 examples/vhost_xen/main.c |  1 +
 examples/vmdq/main.c  |  1 +
 lib/librte_ether/rte_ethdev.h |  1 +
 lib/librte_pmd_e1000/igb_rxtx.c   | 20 +++
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 16 ++
 lib/librte_pmd_virtio/virtio_ethdev.c | 98 ++-
 7 files changed, 159 insertions(+), 5 deletions(-)

-- 
1.8.4.2

[dpdk-dev] [PATCH 2/5] e1000: config VMDQ offload register to receive multicast packet

2014-08-25 Thread Ouyang Changchun

This patch set VM offload register according to VMDQ config for e1000
PMD to support multicast and broadcast packets.

Signed-off-by: Changchun Ouyang 
Acked-by: Huawei Xie 
Acked-by: Cunming Liang 

---
 lib/librte_pmd_e1000/igb_rxtx.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/lib/librte_pmd_e1000/igb_rxtx.c b/lib/librte_pmd_e1000/igb_rxtx.c
index 977c4a2..51b1206 100644
--- a/lib/librte_pmd_e1000/igb_rxtx.c
+++ b/lib/librte_pmd_e1000/igb_rxtx.c
@@ -1768,6 +1768,26 @@ igb_vmdq_rx_hw_configure(struct rte_eth_dev *dev)
vt_ctl |= E1000_VT_CTL_IGNORE_MAC;
E1000_WRITE_REG(hw, E1000_VT_CTL, vt_ctl);

+   for (i = 0; i < E1000_VMOLR_SIZE; i++) {
+   vmolr = E1000_READ_REG(hw, E1000_VMOLR(i));
+   vmolr &= ~(E1000_VMOLR_AUPE | E1000_VMOLR_ROMPE |
+   E1000_VMOLR_ROPE | E1000_VMOLR_BAM |
+   E1000_VMOLR_MPME);
+
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_UNTAG)
+   vmolr |= E1000_VMOLR_AUPE;
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_HASH_MC)
+   vmolr |= E1000_VMOLR_ROMPE;
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_HASH_UC)
+   vmolr |= E1000_VMOLR_ROPE;
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_BROADCAST)
+   vmolr |= E1000_VMOLR_BAM;
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_MULTICAST)
+   vmolr |= E1000_VMOLR_MPME;
+
+   E1000_WRITE_REG(hw, E1000_VMOLR(i), vmolr);
+   }
+
/*
 * VMOLR: set STRVLAN as 1 if IGMAC in VTCTL is set as 1
 * Both 82576 and 82580 support it
-- 
1.8.4.2

[dpdk-dev] [PATCH 3/5] examples/vhost: enable promisc mode and config VMDQ offload register for multicast feature

2014-08-25 Thread Ouyang Changchun

This patch is to let vhost receive and forward multicast and broadcast packets,
add promiscuous option into command line; and set VMDQ RX mode as:
ETH_VMDQ_ACCEPT_BROADCAST|ETH_VMDQ_ACCEPT_MULTICAST if promisc mode is on.

Signed-off-by: Changchun Ouyang 
Acked-by: Huawei Xie 
Acked-by: Cunming Liang 

---
 examples/vhost/main.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 193aa25..4acc7b8 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -161,6 +161,9 @@
 /* mask of enabled ports */
 static uint32_t enabled_port_mask = 0;

+/* Ports set in promiscuous mode off by default. */
+static uint32_t promiscuous_on;
+
 /*Number of switching cores enabled*/
 static uint32_t num_switching_cores = 0;

@@ -278,6 +281,7 @@ static struct rte_eth_conf vmdq_conf_default = {
.enable_default_pool = 0,
.default_pool = 0,
.nb_pool_maps = 0,
+   .rx_mode = 0,
.pool_map = {{0, 0},},
},
},
@@ -368,13 +372,15 @@ static inline int
 get_eth_conf(struct rte_eth_conf *eth_conf, uint32_t num_devices)
 {
struct rte_eth_vmdq_rx_conf conf;
+   struct rte_eth_vmdq_rx_conf *def_conf =
+   &vmdq_conf_default.rx_adv_conf.vmdq_rx_conf;
unsigned i;

memset(&conf, 0, sizeof(conf));
conf.nb_queue_pools = (enum rte_eth_nb_pools)num_devices;
conf.nb_pool_maps = num_devices;
-   conf.enable_loop_back =
-   vmdq_conf_default.rx_adv_conf.vmdq_rx_conf.enable_loop_back;
+   conf.enable_loop_back = def_conf->enable_loop_back;
+   conf.rx_mode = def_conf->rx_mode;

for (i = 0; i < conf.nb_pool_maps; i++) {
conf.pool_map[i].vlan_id = vlan_tags[ i ];
@@ -472,6 +478,9 @@ port_init(uint8_t port)
return retval;
}

+   if (promiscuous_on)
+   rte_eth_promiscuous_enable(port);
+
rte_eth_macaddr_get(port, &vmdq_ports_eth_addr[port]);
RTE_LOG(INFO, VHOST_PORT, "Max virtio devices supported: %u\n", 
num_devices);
RTE_LOG(INFO, VHOST_PORT, "Port %u MAC: %02"PRIx8" %02"PRIx8" %02"PRIx8
@@ -604,7 +613,8 @@ us_vhost_parse_args(int argc, char **argv)
};

/* Parse command line */
-   while ((opt = getopt_long(argc, argv, "p:",long_option, &option_index)) 
!= EOF) {
+   while ((opt = getopt_long(argc, argv, "p:P",
+   long_option, &option_index)) != EOF) {
switch (opt) {
/* Portmask */
case 'p':
@@ -616,6 +626,15 @@ us_vhost_parse_args(int argc, char **argv)
}
break;

+   case 'P':
+   promiscuous_on = 1;
+   vmdq_conf_default.rx_adv_conf.vmdq_rx_conf.rx_mode =
+   ETH_VMDQ_ACCEPT_BROADCAST |
+   ETH_VMDQ_ACCEPT_MULTICAST;
+   VHOST_FEATURES |= (1ULL << VIRTIO_NET_F_CTRL_RX);
+
+   break;
+
case 0:
/* Enable/disable vm2vm comms. */
if (!strncmp(long_option[option_index].name, "vm2vm",
@@ -677,7 +696,7 @@ us_vhost_parse_args(int argc, char **argv)
return -1;
} else {
if (ret)
-   VHOST_FEATURES = (1ULL << 
VIRTIO_NET_F_MRG_RXBUF);
+   VHOST_FEATURES |= (1ULL << 
VIRTIO_NET_F_MRG_RXBUF);
}
}

-- 
1.8.4.2

[dpdk-dev] [PATCH 5/5] examples/vmdq: set default value to rx mode

2014-08-25 Thread Ouyang Changchun

This patch specifies rx_mode as 0 for 2 samples: vmdq and vhost-xen
because the multicast feature is not available currently for both samples.

Signed-off-by: Changchun Ouyang 
Acked-by: Huawei Xie 
Acked-by: Cunming Liang 

---
 examples/vhost_xen/main.c | 1 +
 examples/vmdq/main.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/examples/vhost_xen/main.c b/examples/vhost_xen/main.c
index b275747..d451272 100644
--- a/examples/vhost_xen/main.c
+++ b/examples/vhost_xen/main.c
@@ -191,6 +191,7 @@ static const struct rte_eth_conf vmdq_conf_default = {
.enable_default_pool = 0,
.default_pool = 0,
.nb_pool_maps = 0,
+   .rx_mode = 0,
.pool_map = {{0, 0},},
},
},
diff --git a/examples/vmdq/main.c b/examples/vmdq/main.c
index 35df234..0cfd963 100644
--- a/examples/vmdq/main.c
+++ b/examples/vmdq/main.c
@@ -172,6 +172,7 @@ static const struct rte_eth_conf vmdq_conf_default = {
.enable_default_pool = 0,
.default_pool = 0,
.nb_pool_maps = 0,
+   .rx_mode = 0,
.pool_map = {{0, 0},},
},
},
-- 
1.8.4.2

[dpdk-dev] [PATCH 4/5] virtio: New API to enable/disable multicast and promisc mode

2014-08-25 Thread Ouyang Changchun

This patch adds new API in virtio for supporting promiscuous and allmulticast 
enabling and disabling.

Signed-off-by: Changchun Ouyang 
Acked-by: Huawei Xie 
Acked-by: Cunming Liang 

---
 lib/librte_pmd_virtio/virtio_ethdev.c | 98 ++-
 1 file changed, 97 insertions(+), 1 deletion(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 6293ac6..c7f874a 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -66,6 +66,10 @@ static int eth_virtio_dev_init(struct eth_driver *eth_drv,
 static int  virtio_dev_configure(struct rte_eth_dev *dev);
 static int  virtio_dev_start(struct rte_eth_dev *dev);
 static void virtio_dev_stop(struct rte_eth_dev *dev);
+static void virtio_dev_promiscuous_enable(struct rte_eth_dev *dev);
+static void virtio_dev_promiscuous_disable(struct rte_eth_dev *dev);
+static void virtio_dev_allmulticast_enable(struct rte_eth_dev *dev);
+static void virtio_dev_allmulticast_disable(struct rte_eth_dev *dev);
 static void virtio_dev_info_get(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info);
 static int virtio_dev_link_update(struct rte_eth_dev *dev,
@@ -403,6 +407,94 @@ virtio_dev_close(struct rte_eth_dev *dev)
virtio_dev_stop(dev);
 }

+static void
+virtio_dev_promiscuous_enable(struct rte_eth_dev *dev)
+{
+   struct virtio_hw *hw
+   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_pmd_ctrl ctrl;
+   int dlen[1];
+   int ret;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
+   ctrl.data[0] = 1;
+   dlen[0] = 1;
+
+   ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+   if (ret) {
+   PMD_INIT_LOG(ERR, "Promisc enabling but send command "
+ "failed, this is too late now...\n");
+   }
+}
+
+static void
+virtio_dev_promiscuous_disable(struct rte_eth_dev *dev)
+{
+   struct virtio_hw *hw
+   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_pmd_ctrl ctrl;
+   int dlen[1];
+   int ret;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
+   ctrl.data[0] = 0;
+   dlen[0] = 1;
+
+   ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+   if (ret) {
+   PMD_INIT_LOG(ERR, "Promisc disabling but send command "
+ "failed, this is too late now...\n");
+   }
+}
+
+static void
+virtio_dev_allmulticast_enable(struct rte_eth_dev *dev)
+{
+   struct virtio_hw *hw
+   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_pmd_ctrl ctrl;
+   int dlen[1];
+   int ret;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
+   ctrl.data[0] = 1;
+   dlen[0] = 1;
+
+   ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+   if (ret) {
+   PMD_INIT_LOG(ERR, "Promisc enabling but send command "
+ "failed, this is too late now...\n");
+   }
+}
+
+static void
+virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
+{
+   struct virtio_hw *hw
+   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_pmd_ctrl ctrl;
+   int dlen[1];
+   int ret;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
+   ctrl.data[0] = 0;
+   dlen[0] = 1;
+
+   ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
+
+   if (ret) {
+   PMD_INIT_LOG(ERR, "Promisc disabling but send command "
+ "failed, this is too late now...\n");
+   }
+}
+
 /*
  * dev_ops for virtio, bare necessities for basic operation
  */
@@ -411,6 +503,10 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.dev_start   = virtio_dev_start,
.dev_stop= virtio_dev_stop,
.dev_close   = virtio_dev_close,
+   .promiscuous_enable  = virtio_dev_promiscuous_enable,
+   .promiscuous_disable = virtio_dev_promiscuous_disable,
+   .allmulticast_enable = virtio_dev_allmulticast_enable,
+   .allmulticast_disable= virtio_dev_allmulticast_disable,

.dev_infos_get   = virtio_dev_info_get,
.stats_get   = virtio_dev_stats_get,
@@ -561,7 +657,7 @@ virtio_negotiate_features(struct virtio_hw *hw)
 {
uint32_t host_features, mask;

-   mask = VIRTIO_NET_F_CTRL_RX | VIRTIO_NET_F_CTRL_VLAN;
+   mask = VIRTIO_NET_F_CTRL_VLAN;
mask |= VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;

/* TSO and LRO are only available when their corresponding
-- 
1.8.4.2

[dpdk-dev] [PATCH 1/5] ethdev: Add new config field to config VMDQ offload register

2014-08-25 Thread Ouyang Changchun

This patch adds new field of rx mode in VMDQ config; and set the register 
PFVML2FLT
for IXGBE PMD, this makes VMDQ receive multicast and broadcast packets.

Signed-off-by: Changchun Ouyang 
Acked-by: Huawei Xie 
Acked-by: Cunming Liang 

---
 lib/librte_ether/rte_ethdev.h |  1 +
 lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 16 
 2 files changed, 17 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 50df654..f44dd2d 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -575,6 +575,7 @@ struct rte_eth_vmdq_rx_conf {
uint8_t default_pool; /**< The default pool, if applicable */
uint8_t enable_loop_back; /**< Enable VT loop back */
uint8_t nb_pool_maps; /**< We can have up to 64 filters/mappings */
+   uint32_t rx_mode; /**< RX mode for vmdq */
struct {
uint16_t vlan_id; /**< The vlan id of the received frame */
uint64_t pools;   /**< Bitmask of pools for packet rx */
diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c 
b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
index dfc2076..9efdbfb 100644
--- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
+++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
@@ -3084,6 +3084,7 @@ ixgbe_vmdq_rx_hw_configure(struct rte_eth_dev *dev)
struct ixgbe_hw *hw;
enum rte_eth_nb_pools num_pools;
uint32_t mrqc, vt_ctl, vlanctrl;
+   uint32_t vmolr = 0;
int i;

PMD_INIT_FUNC_TRACE();
@@ -3106,6 +3107,21 @@ ixgbe_vmdq_rx_hw_configure(struct rte_eth_dev *dev)

IXGBE_WRITE_REG(hw, IXGBE_VT_CTL, vt_ctl);

+   for (i = 0; i < (int)num_pools; i++) {
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_UNTAG)
+   vmolr |= IXGBE_VMOLR_AUPE;
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_HASH_MC)
+   vmolr |= IXGBE_VMOLR_ROMPE;
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_HASH_UC)
+   vmolr |= IXGBE_VMOLR_ROPE;
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_BROADCAST)
+   vmolr |= IXGBE_VMOLR_BAM;
+   if (cfg->rx_mode & ETH_VMDQ_ACCEPT_MULTICAST)
+   vmolr |= IXGBE_VMOLR_MPE;
+
+   IXGBE_WRITE_REG(hw, IXGBE_VMOLR(i), vmolr);
+   }
+
/* VLNCTRL: enable vlan filtering and allow all vlan tags through */
vlanctrl = IXGBE_READ_REG(hw, IXGBE_VLNCTRL);
vlanctrl |= IXGBE_VLNCTRL_VFE ; /* enable vlan filters */
-- 
1.8.4.2

[dpdk-dev] [PATCH 4/5] virtio: New API to enable/disable multicast and promisc mode

2014-08-26 Thread Ouyang, Changchun

Hi  Stephen,

My response below.

Thanks 
Changchun


> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, August 26, 2014 8:13 AM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 4/5] virtio: New API to enable/disable
> multicast and promisc mode
> 
> On Mon, 25 Aug 2014 10:09:31 +0800
> Ouyang Changchun  wrote:
> 
> > This patch adds new API in virtio for supporting promiscuous and
> allmulticast enabling and disabling.
> >
> > Signed-off-by: Changchun Ouyang 
> > Acked-by: Huawei Xie 
> > Acked-by: Cunming Liang 
> >
> > ---
> >  lib/librte_pmd_virtio/virtio_ethdev.c | 98
> > ++-
> >  1 file changed, 97 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c
> > b/lib/librte_pmd_virtio/virtio_ethdev.c
> > index 6293ac6..c7f874a 100644
> > --- a/lib/librte_pmd_virtio/virtio_ethdev.c
> > +++ b/lib/librte_pmd_virtio/virtio_ethdev.c
> > @@ -66,6 +66,10 @@ static int eth_virtio_dev_init(struct eth_driver
> > *eth_drv,  static int  virtio_dev_configure(struct rte_eth_dev *dev);
> > static int  virtio_dev_start(struct rte_eth_dev *dev);  static void
> > virtio_dev_stop(struct rte_eth_dev *dev);
> > +static void virtio_dev_promiscuous_enable(struct rte_eth_dev *dev);
> > +static void virtio_dev_promiscuous_disable(struct rte_eth_dev *dev);
> > +static void virtio_dev_allmulticast_enable(struct rte_eth_dev *dev);
> > +static void virtio_dev_allmulticast_disable(struct rte_eth_dev *dev);
> >  static void virtio_dev_info_get(struct rte_eth_dev *dev,
> > struct rte_eth_dev_info *dev_info);  static
> int
> > virtio_dev_link_update(struct rte_eth_dev *dev, @@ -403,6 +407,94 @@
> > virtio_dev_close(struct rte_eth_dev *dev)
> > virtio_dev_stop(dev);
> >  }
> >
> > +static void
> > +virtio_dev_promiscuous_enable(struct rte_eth_dev *dev) {
> > +   struct virtio_hw *hw
> > +   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > +   struct virtio_pmd_ctrl ctrl;
> > +   int dlen[1];
> > +   int ret;
> > +
> > +   ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
> > +   ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
> > +   ctrl.data[0] = 1;
> > +   dlen[0] = 1;
> > +
> > +   ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
> > +
> > +   if (ret) {
> > +   PMD_INIT_LOG(ERR, "Promisc enabling but send command "
> > + "failed, this is too late now...\n");
> > +   }
> > +}
> > +
> > +static void
> > +virtio_dev_promiscuous_disable(struct rte_eth_dev *dev) {
> > +   struct virtio_hw *hw
> > +   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > +   struct virtio_pmd_ctrl ctrl;
> > +   int dlen[1];
> > +   int ret;
> > +
> > +   ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
> > +   ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_PROMISC;
> > +   ctrl.data[0] = 0;
> > +   dlen[0] = 1;
> > +
> > +   ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
> > +
> > +   if (ret) {
> > +   PMD_INIT_LOG(ERR, "Promisc disabling but send command "
> > + "failed, this is too late now...\n");
> > +   }
> > +}
> > +
> > +static void
> > +virtio_dev_allmulticast_enable(struct rte_eth_dev *dev) {
> > +   struct virtio_hw *hw
> > +   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > +   struct virtio_pmd_ctrl ctrl;
> > +   int dlen[1];
> > +   int ret;
> > +
> > +   ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
> > +   ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
> > +   ctrl.data[0] = 1;
> > +   dlen[0] = 1;
> > +
> > +   ret = virtio_send_command(hw->cvq, &ctrl, dlen, 1);
> > +
> > +   if (ret) {
> > +   PMD_INIT_LOG(ERR, "Promisc enabling but send command "
> > + "failed, this is too late now...\n");
> > +   }
> > +}
> > +
> > +static void
> > +virtio_dev_allmulticast_disable(struct rte_eth_dev *dev) {
> > +   struct virtio_hw *hw
> > +   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > +   struct virtio_pmd_ctrl ctrl;
> > +   int dlen[1];
> > +   int ret;
> > +
> > +   ctrl.hdr.class = VIRTIO_NET_CTRL_RX;
> > +   ctrl.hdr.cmd = VIRTIO_NET_CTRL_RX_ALLMULTI;
> > +   ctrl.data[0] = 0;
> >

[dpdk-dev] [RFC 07/10] virtio: remove unnecessary adapter structure

2014-08-26 Thread Ouyang, Changchun


Acked-by: Changchun Ouyang 

> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, August 26, 2014 10:08 AM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org
> Subject: [RFC 07/10] virtio: remove unnecessary adapter structure
> 
> Cleanup virtio code by eliminating unnecessary nesting of virtio hardware
> structure inside adapter structure.
> Also allows removing unneeded macro, making code clearer.
> 
> ---
>  lib/librte_pmd_virtio/virtio_ethdev.c |   31 +++
>  lib/librte_pmd_virtio/virtio_ethdev.h |9 -
>  lib/librte_pmd_virtio/virtio_rxtx.c   |3 +--
>  3 files changed, 12 insertions(+), 31 deletions(-)
>

[dpdk-dev] [RFC 10/10] virtio: add support for promiscious and multicast

2014-08-26 Thread Ouyang, Changchun

This patch is very similar with my previous patch:

[PATCH 4/5] virtio: New API to enable/disable multicast and promisc mode

So suggest applying only one of both.

Thanks 
Changchun

> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, August 26, 2014 10:08 AM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: [RFC 10/10] virtio: add support for promiscious and multicast
> 
> Implement standard virtio controls for enabling and disabling promiscious
> and multicast.
> 
> Signed-off-by: Stephen Hemminger 
> 
> --- a/lib/librte_pmd_virtio/virtio_ethdev.c   2014-08-25
> 19:00:16.754586819 -0700
> +++ b/lib/librte_pmd_virtio/virtio_ethdev.c   2014-08-25
> 19:02:48.019397658 -0700

[dpdk-dev] [RFC 01/10] virtio: rearrange resource initialization

2014-08-26 Thread Ouyang, Changchun

Acked-by: Changchun Ouyang 

> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, August 26, 2014 10:08 AM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org
> Subject: [RFC 01/10] virtio: rearrange resource initialization
> 
> For clarity make the setup of PCI resources for Linux into a function rather
> than block of code #ifdef'd in middle of dev_init.
> 
> ---
>  lib/librte_pmd_virtio/virtio_ethdev.c |   76 +++--
> -
>  1 file changed, 43 insertions(+), 33 deletions(-)
> 
> --- a/lib/librte_pmd_virtio/virtio_ethdev.c   2014-08-25
> 19:00:03.622515574 -0700
> +++ b/lib/librte_pmd_virtio/virtio_ethdev.c   2014-08-25
> 19:00:03.622515574 -0700
> @@ -706,6 +706,41 @@ virtio_has_msix(const struct rte_pci_add
> 
>   return (d != NULL);
>  }
> +
> +/* Extract I/O port numbers from sysfs */ static int
> +virtio_resource_init(struct rte_pci_device *pci_dev) {
> + char dirname[PATH_MAX];
> + char filename[PATH_MAX];
> + unsigned long start, size;
> +
> + if (get_uio_dev(&pci_dev->addr, dirname, sizeof(dirname)) < 0)
> + return -1;
> +
> + /* get portio size */
> + snprintf(filename, sizeof(filename),
> +  "%s/portio/port0/size", dirname);
> + if (parse_sysfs_value(filename, &size) < 0) {
> + PMD_INIT_LOG(ERR, "%s(): cannot parse size",
> +  __func__);
> + return -1;
> + }
> +
> + /* get portio start */
> + snprintf(filename, sizeof(filename),
> +  "%s/portio/port0/start", dirname);
> + if (parse_sysfs_value(filename, &start) < 0) {
> + PMD_INIT_LOG(ERR, "%s(): cannot parse portio start",
> +  __func__);
> + return -1;
> + }
> + pci_dev->mem_resource[0].addr = (void *)(uintptr_t)start;
> + pci_dev->mem_resource[0].len =  (uint64_t)size;
> + PMD_INIT_LOG(DEBUG,
> +  "PCI Port IO found start=0x%lx with size=0x%lx",
> +  start, size);
> + return 0;
> +}
>  #else
>  static int
>  virtio_has_msix(const struct rte_pci_addr *loc __rte_unused) @@ -713,6
> +748,12 @@ virtio_has_msix(const struct rte_pci_add
>   /* nic_uio does not enable interrupts, return 0 (false). */
>   return 0;
>  }
> +
> +static int virtio_resource_init(struct rte_pci_device *pci_dev
> +__rte_unused) {
> + /* no setup required */
> + return 0;
> +}
>  #endif
> 
>  /*
> @@ -749,40 +790,9 @@ eth_virtio_dev_init(__rte_unused struct
>   return 0;
> 
>   pci_dev = eth_dev->pci_dev;
> + if (virtio_resource_init(pci_dev) < 0)
> + return -1;
> 
> -#ifdef RTE_EXEC_ENV_LINUXAPP
> - {
> - char dirname[PATH_MAX];
> - char filename[PATH_MAX];
> - unsigned long start, size;
> -
> - if (get_uio_dev(&pci_dev->addr, dirname, sizeof(dirname))
> < 0)
> - return -1;
> -
> - /* get portio size */
> - snprintf(filename, sizeof(filename),
> -  "%s/portio/port0/size", dirname);
> - if (parse_sysfs_value(filename, &size) < 0) {
> - PMD_INIT_LOG(ERR, "%s(): cannot parse size",
> -  __func__);
> - return -1;
> - }
> -
> - /* get portio start */
> - snprintf(filename, sizeof(filename),
> -  "%s/portio/port0/start", dirname);
> - if (parse_sysfs_value(filename, &start) < 0) {
> - PMD_INIT_LOG(ERR, "%s(): cannot parse portio
> start",
> -  __func__);
> - return -1;
> - }
> - pci_dev->mem_resource[0].addr = (void *)(uintptr_t)start;
> - pci_dev->mem_resource[0].len =  (uint64_t)size;
> - PMD_INIT_LOG(DEBUG,
> -  "PCI Port IO found start=0x%lx with size=0x%lx",
> -  start, size);
> - }
> -#endif
>   hw->use_msix = virtio_has_msix(&pci_dev->addr);
>   hw->io_base = (uint32_t)(uintptr_t)pci_dev-
> >mem_resource[0].addr;
>

[dpdk-dev] [RFC 06/10] virtio: use software vlan stripping

2014-08-26 Thread Ouyang, Changchun

Hi Stephen,

Would you please describe the use scenario for the front end rx vlan strip and 
tx vlan insertion?
In our current implementation, backend will strip vlan tag for RX, and insert 
vlan tag for TX.

Thanks
Changchun

> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, August 26, 2014 10:08 AM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: [RFC 06/10] virtio: use software vlan stripping
> 
> Implement VLAN stripping in software. This allows application to be device
> independent.
> 
> Signed-off-by: Stephen Hemminger 
> 
> 
> ---
>  lib/librte_pmd_virtio/virtio_ethdev.c |2 ++
>  lib/librte_pmd_virtio/virtio_pci.h|1 +
>  lib/librte_pmd_virtio/virtio_rxtx.c   |   20 ++--
>  3 files changed, 21 insertions(+), 2 deletions(-)
> 
> --- a/lib/librte_pmd_virtio/virtio_ethdev.c   2014-08-25
> 19:00:07.574537243 -0700
> +++ b/lib/librte_pmd_virtio/virtio_ethdev.c   2014-08-25
> 19:00:07.574537243 -0700
> @@ -976,6 +976,8 @@ virtio_dev_configure(struct rte_eth_dev
>   return (-EINVAL);
>   }
> 
> + hw->vlan_strip = rxmode->hw_vlan_strip;
> +
>   ret = vtpci_irq_config(hw, 0);
>   if (ret != 0)
>   PMD_DRV_LOG(ERR, "failed to set config vector");
> --- a/lib/librte_pmd_virtio/virtio_pci.h  2014-08-25 19:00:07.574537243 
> -0700
> +++ b/lib/librte_pmd_virtio/virtio_pci.h  2014-08-25
> 19:00:07.574537243 -0700
> @@ -168,6 +168,7 @@ struct virtio_hw {
>   uint32_tmax_tx_queues;
>   uint32_tmax_rx_queues;
>   uint16_tvtnet_hdr_size;
> + uint8_t vlan_strip;
>   uint8_t use_msix;
>   uint8_t mac_addr[ETHER_ADDR_LEN];
>  };
> --- a/lib/librte_pmd_virtio/virtio_rxtx.c 2014-08-25 19:00:07.574537243 
> -0700
> +++ b/lib/librte_pmd_virtio/virtio_rxtx.c 2014-08-25
> 19:00:07.574537243 -0700
> @@ -49,6 +49,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "virtio_logs.h"
>  #include "virtio_ethdev.h"
> @@ -406,8 +407,8 @@ virtio_dev_tx_queue_setup(struct rte_eth
> 
>   PMD_INIT_FUNC_TRACE();
> 
> - if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOOFFLOADS)
> - != ETH_TXQ_FLAGS_NOOFFLOADS) {
> + if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS)
> + != ETH_TXQ_FLAGS_NOXSUMS) {
>   PMD_INIT_LOG(ERR, "TX checksum offload not
> supported\n");
>   return -EINVAL;
>   }
> @@ -444,6 +445,7 @@ uint16_t
>  virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t
> nb_pkts)  {
>   struct virtqueue *rxvq = rx_queue;
> + struct virtio_hw *hw = rxvq->hw;
>   struct rte_mbuf *rxm, *new_mbuf;
>   uint16_t nb_used, num, nb_rx = 0;
>   uint32_t len[VIRTIO_MBUF_BURST_SZ];
> @@ -487,6 +489,9 @@ virtio_recv_pkts(void *rx_queue, struct
>   rxm->pkt.pkt_len = (uint32_t)(len[i] - hdr_size);
>   rxm->pkt.data_len = (uint16_t)(len[i] - hdr_size);
> 
> + if (hw->vlan_strip)
> + rte_vlan_strip(rxm);
> +
>   VIRTIO_DUMP_PACKET(rxm, rxm->pkt.data_len);
> 
>   rx_pkts[nb_rx++] = rxm;
> @@ -711,6 +716,17 @@ virtio_xmit_pkts(void *tx_queue, struct
> 
>   if (tx_pkts[nb_tx]->pkt.nb_segs <= txvq->vq_free_cnt) {
>   txm = tx_pkts[nb_tx];
> +
> + /* Do VLAN tag insertion */
> + if (txm->ol_flags & PKT_TX_VLAN_PKT) {
> + error = rte_vlan_insert(txm);
> + if (unlikely(error)) {
> + rte_pktmbuf_free(txm);
> + ++nb_tx;
> + continue;
> + }
> + }
> +
>   /* Enqueue Packet buffers */
>   error = virtqueue_enqueue_xmit(txvq, txm);
>   if (unlikely(error)) {

[dpdk-dev] [RFC 08/10] virtio: remove redundant vq_alignment

2014-08-26 Thread Ouyang, Changchun

Acked by: Changchun Ouyang 

> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Tuesday, August 26, 2014 10:08 AM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org; Stephen Hemminger
> Subject: [RFC 08/10] virtio: remove redundant vq_alignment
> 
> Since vq_alignment is constant (always 4K), it does not need to be part of the
> vring struct.
> 
> Signed-off-by: Stephen Hemminger 
> 
> ---
>  lib/librte_pmd_virtio/virtio_ethdev.c |1 -
>  lib/librte_pmd_virtio/virtio_rxtx.c   |2 +-
>  lib/librte_pmd_virtio/virtqueue.h |3 +--
>  3 files changed, 2 insertions(+), 4 deletions(-)
> 
> --- a/lib/librte_pmd_virtio/virtio_ethdev.c   2014-08-25
> 19:00:09.918550097 -0700
> +++ b/lib/librte_pmd_virtio/virtio_ethdev.c   2014-08-25
> 19:00:09.918550097 -0700
> @@ -290,7 +290,6 @@ int virtio_dev_queue_setup(struct rte_et
>   vq->port_id = dev->data->port_id;
>   vq->queue_id = queue_idx;
>   vq->vq_queue_index = vtpci_queue_idx;
> - vq->vq_alignment = VIRTIO_PCI_VRING_ALIGN;
>   vq->vq_nentries = vq_size;
>   vq->vq_free_cnt = vq_size;
> 
> --- a/lib/librte_pmd_virtio/virtio_rxtx.c 2014-08-25 19:00:09.918550097 
> -0700
> +++ b/lib/librte_pmd_virtio/virtio_rxtx.c 2014-08-25
> 19:00:09.918550097 -0700
> @@ -258,7 +258,7 @@ virtio_dev_vring_start(struct virtqueue
>* Reinitialise since virtio port might have been stopped and restarted
>*/
>   memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
> - vring_init(vr, size, ring_mem, vq->vq_alignment);
> + vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
>   vq->vq_used_cons_idx = 0;
>   vq->vq_desc_head_idx = 0;
>   vq->vq_avail_idx = 0;
> --- a/lib/librte_pmd_virtio/virtqueue.h   2014-08-25 19:00:09.918550097 
> -0700
> +++ b/lib/librte_pmd_virtio/virtqueue.h   2014-08-25
> 19:00:09.918550097 -0700
> @@ -139,8 +139,7 @@ struct virtqueue {
>   uint8_t port_id;  /**< Device port identifier. */
> 
>   void*vq_ring_virt_mem;/**< linear address of vring*/
> - int vq_alignment;
> - int vq_ring_size;
> + unsigned int vq_ring_size;
>   phys_addr_t vq_ring_mem;  /**< physical address of vring */
> 
>   struct vring vq_ring;/**< vring keeping desc, used and avail */

[dpdk-dev] [RFC] lib/librte_vhost: qemu vhost-user support into DPDK vhost library

2014-08-27 Thread Ouyang, Changchun

Do we have performance comparison between both implementation?
Thanks
Changchun

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Xie, Huawei
Sent: Tuesday, August 26, 2014 7:06 PM
To: dev at dpdk.org
Subject: Re: [dpdk-dev] [RFC] lib/librte_vhost: qemu vhost-user support into 
DPDK vhost library

Hi all:
We are implementing qemu official vhost-user interface into DPDK vhost library, 
so there would be two coexisting implementations for user space vhost backend.
Pro and cons in my mind:
Existing solution:
Pros:  works with qemu version before 2.1;  Cons: depends on eventfd proxy 
kernel module and extra maintenance effort Qemu vhost-user:
   Pros:  qemu official us-vhost interface; Cons: only 
available after qemu 2.1

BR.
huawei

[dpdk-dev] [RFC] lib/librte_vhost: qemu vhost-user support into DPDK vhost library

2014-08-27 Thread Ouyang, Changchun

Hi Tetsuya

Thanks for your response.
Agree with you, the performance should be same as the data path (RX/TX) is not 
affected,
The difference between implementation only exists in the virtio device creation 
and destroy stage.

Regards,
Changchun

> -Original Message-
> From: Tetsuya.Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Wednesday, August 27, 2014 12:39 PM
> To: Ouyang, Changchun; dev at dpdk.org
> Cc: Xie, Huawei; Katsuya MATSUBARA; nakajima.yoshihiro at lab.ntt.co.jp;
> Hitoshi Masutani
> Subject: Re: [dpdk-dev] [RFC] lib/librte_vhost: qemu vhost-user support into
> DPDK vhost library
> 
> 
> (2014/08/27 9:43), Ouyang, Changchun wrote:
> > Do we have performance comparison between both implementation?
> Hi Changchun,
> 
> If DPDK applications are running on both guest and host side, the
> performance should be almost same, because while transmitting data virt
> queues are accessed by virtio-net PMD and libvhost. In libvhost, the existing
> vhost implementation and a vhost-user implementation will shares or uses
> same code to access virt queues. So I guess the performance will be almost
> same.
> 
> Thanks,
> Tetsuya
> 
> 
> > Thanks
> > Changchun
> >
> >
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xie, Huawei
> > Sent: Tuesday, August 26, 2014 7:06 PM
> > To: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC] lib/librte_vhost: qemu vhost-user
> > support into DPDK vhost library
> >
> > Hi all:
> > We are implementing qemu official vhost-user interface into DPDK vhost
> library, so there would be two coexisting implementations for user space
> vhost backend.
> > Pro and cons in my mind:
> > Existing solution:
> > Pros:  works with qemu version before 2.1;  Cons: depends on eventfd
> proxy kernel module and extra maintenance effort Qemu vhost-user:
> >Pros:  qemu official us-vhost interface; Cons: only 
> > available after
> qemu 2.1
> >
> > BR.
> > huawei

[dpdk-dev] [RFC 06/10] virtio: use software vlan stripping

2014-08-27 Thread Ouyang, Changchun


> -Original Message-
> From: Stephen Hemminger [mailto:stephen at networkplumber.org]
> Sent: Wednesday, August 27, 2014 12:24 AM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org
> Subject: Re: [RFC 06/10] virtio: use software vlan stripping
> 
> On Tue, 26 Aug 2014 08:37:11 +0000
> "Ouyang, Changchun"  wrote:
> 
> > Hi Stephen,
> >
> > Would you please describe the use scenario for the front end rx vlan strip
> and tx vlan insertion?
> > In our current implementation, backend will strip vlan tag for RX, and 
> > insert
> vlan tag for TX.
> >
> > Thanks
> > Changchun
> 
> First, we don't have to do software VLAN strip on our backend if we do this.
> And this way we can always use VLAN insert on transmit. Otherwise you
> have to introduce special case because there is no DPDK API to determine if
> device does or does not do VLAN handling.
> 

How the virtio frontend tell backend whether it has software VLAN strip feature 
or not?
It seems no feature bit to negotiate it.

Thanks
Changchun

[dpdk-dev] [RFC] lib/librte_vhost: qemu vhost-user support into DPDK vhost library

2014-08-27 Thread Ouyang, Changchun



> -Original Message-
> From: Tetsuya.Mukawa [mailto:mukawa at igel.co.jp]
> Sent: Wednesday, August 27, 2014 1:28 PM
> To: Ouyang, Changchun; dev at dpdk.org
> Cc: Xie, Huawei; Katsuya MATSUBARA; nakajima.yoshihiro at lab.ntt.co.jp;
> Hitoshi Masutani
> Subject: Re: [dpdk-dev] [RFC] lib/librte_vhost: qemu vhost-user support into
> DPDK vhost library
> 
> Hi Changchun,
> 
> (2014/08/27 14:01), Ouyang, Changchun wrote:
> > Agree with you, the performance should be same as the data path
> > (RX/TX) is not affected, The difference between implementation only
> exists in the virtio device creation and destroy stage.
> Yes, I agree. Also There may be the difference, if a virtio-net driver on a
> guest isn't poll mode like a virtio-net device driver in the kernel. In the 
> case,
> existing vhost implementation uses the eventfd kernel module, and vhost-
> user implementation uses eventfd to kick the driver. So I guess there will be
> the difference.
> 
> Anyway, about device creation and destruction, the difference will come
> from transmission speed between unix domain socket and CUSE. I am not
> sure which is faster.

Yes, it doesn't matter which one is faster for virtio device creation and 
destroy, 
as it is not in data path. 

> Thanks,
> Tetsuya
> 
> 
> >
> > Regards,
> > Changchun
> >
> >> -----Original Message-
> >> From: Tetsuya.Mukawa [mailto:mukawa at igel.co.jp]
> >> Sent: Wednesday, August 27, 2014 12:39 PM
> >> To: Ouyang, Changchun; dev at dpdk.org
> >> Cc: Xie, Huawei; Katsuya MATSUBARA; nakajima.yoshihiro at lab.ntt.co.jp;
> >> Hitoshi Masutani
> >> Subject: Re: [dpdk-dev] [RFC] lib/librte_vhost: qemu vhost-user
> >> support into DPDK vhost library
> >>
> >>
> >> (2014/08/27 9:43), Ouyang, Changchun wrote:
> >>> Do we have performance comparison between both implementation?
> >> Hi Changchun,
> >>
> >> If DPDK applications are running on both guest and host side, the
> >> performance should be almost same, because while transmitting data
> >> virt queues are accessed by virtio-net PMD and libvhost. In libvhost,
> >> the existing vhost implementation and a vhost-user implementation
> >> will shares or uses same code to access virt queues. So I guess the
> >> performance will be almost same.
> >>
> >> Thanks,
> >> Tetsuya
> >>
> >>
> >>> Thanks
> >>> Changchun
> >>>
> >>>
> >>> -Original Message-
> >>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Xie, Huawei
> >>> Sent: Tuesday, August 26, 2014 7:06 PM
> >>> To: dev at dpdk.org
> >>> Subject: Re: [dpdk-dev] [RFC] lib/librte_vhost: qemu vhost-user
> >>> support into DPDK vhost library
> >>>
> >>> Hi all:
> >>> We are implementing qemu official vhost-user interface into DPDK
> >>> vhost
> >> library, so there would be two coexisting implementations for user
> >> space vhost backend.
> >>> Pro and cons in my mind:
> >>> Existing solution:
> >>> Pros:  works with qemu version before 2.1;  Cons: depends on eventfd
> >> proxy kernel module and extra maintenance effort Qemu vhost-user:
> >>>Pros:  qemu official us-vhost interface; Cons: only 
> >>> available
> after
> >> qemu 2.1
> >>> BR.
> >>> huawei

[dpdk-dev] virtio merging - no UIO

2014-12-02 Thread Ouyang, Changchun

Hi Vincent,
Thanks for your highlighting.  Will consider how to resolve it. 
regards
Changchun

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vincent JARDIN
> Sent: Tuesday, December 2, 2014 4:23 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] virtio merging - no UIO
> 
>  From today's call, I'd like to highlight that virtio-net-pmd (said code B - 
> from
> 6WIND) does not require UIO; it was required for some security reasons of
> the guest Linux OS:
>http://dpdk.org/browse/virtio-net-pmd/tree/virtio_user.c#n1494
> 
> Thank you,
>Vincent

[dpdk-dev] [RFC PATCH 01/17] virtio: Rearrange resource initialization

2014-12-08 Thread Ouyang Changchun

For clarity make the setup of PCI resources for Linux into a function rather
than block of code #ifdef'd in middle of dev_init.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 76 ---
 1 file changed, 43 insertions(+), 33 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index c009f2a..6c31598 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -794,6 +794,41 @@ virtio_has_msix(const struct rte_pci_addr *loc)

return (d != NULL);
 }
+
+/* Extract I/O port numbers from sysfs */
+static int virtio_resource_init(struct rte_pci_device *pci_dev)
+{
+   char dirname[PATH_MAX];
+   char filename[PATH_MAX];
+   unsigned long start, size;
+
+   if (get_uio_dev(&pci_dev->addr, dirname, sizeof(dirname)) < 0)
+   return -1;
+
+   /* get portio size */
+   snprintf(filename, sizeof(filename),
+"%s/portio/port0/size", dirname);
+   if (parse_sysfs_value(filename, &size) < 0) {
+   PMD_INIT_LOG(ERR, "%s(): cannot parse size",
+__func__);
+   return -1;
+   }
+
+   /* get portio start */
+   snprintf(filename, sizeof(filename),
+"%s/portio/port0/start", dirname);
+   if (parse_sysfs_value(filename, &start) < 0) {
+   PMD_INIT_LOG(ERR, "%s(): cannot parse portio start",
+__func__);
+   return -1;
+   }
+   pci_dev->mem_resource[0].addr = (void *)(uintptr_t)start;
+   pci_dev->mem_resource[0].len =  (uint64_t)size;
+   PMD_INIT_LOG(DEBUG,
+"PCI Port IO found start=0x%lx with size=0x%lx",
+start, size);
+   return 0;
+}
 #else
 static int
 virtio_has_msix(const struct rte_pci_addr *loc __rte_unused)
@@ -801,6 +836,12 @@ virtio_has_msix(const struct rte_pci_addr *loc 
__rte_unused)
/* nic_uio does not enable interrupts, return 0 (false). */
return 0;
 }
+
+static int virtio_resource_init(struct rte_pci_device *pci_dev __rte_unused)
+{
+   /* no setup required */
+   return 0;
+}
 #endif

 /*
@@ -831,40 +872,9 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
return 0;

pci_dev = eth_dev->pci_dev;
+   if (virtio_resource_init(pci_dev) < 0)
+   return -1;

-#ifdef RTE_EXEC_ENV_LINUXAPP
-   {
-   char dirname[PATH_MAX];
-   char filename[PATH_MAX];
-   unsigned long start, size;
-
-   if (get_uio_dev(&pci_dev->addr, dirname, sizeof(dirname)) < 0)
-   return -1;
-
-   /* get portio size */
-   snprintf(filename, sizeof(filename),
-"%s/portio/port0/size", dirname);
-   if (parse_sysfs_value(filename, &size) < 0) {
-   PMD_INIT_LOG(ERR, "%s(): cannot parse size",
-__func__);
-   return -1;
-   }
-
-   /* get portio start */
-   snprintf(filename, sizeof(filename),
-"%s/portio/port0/start", dirname);
-   if (parse_sysfs_value(filename, &start) < 0) {
-   PMD_INIT_LOG(ERR, "%s(): cannot parse portio start",
-__func__);
-   return -1;
-   }
-   pci_dev->mem_resource[0].addr = (void *)(uintptr_t)start;
-   pci_dev->mem_resource[0].len =  (uint64_t)size;
-   PMD_INIT_LOG(DEBUG,
-"PCI Port IO found start=0x%lx with size=0x%lx",
-start, size);
-   }
-#endif
hw->use_msix = virtio_has_msix(&pci_dev->addr);
hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;

-- 
1.8.4.2

[dpdk-dev] [RFC PATCH 05/17] ether: Add soft vlan encap/decap functions

2014-12-08 Thread Ouyang Changchun

It is helpful to allow device drivers that don't support hardware
VLAN stripping to emulate this in software. This allows application
to be device independent.

Avoid discarding shared mbufs. Make a copy in rte_vlan_insert() of any
packet to be tagged that has a reference count > 1.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_ether/rte_ether.h | 76 
 1 file changed, 76 insertions(+)

diff --git a/lib/librte_ether/rte_ether.h b/lib/librte_ether/rte_ether.h
index 187608d..3b6ab4b 100644
--- a/lib/librte_ether/rte_ether.h
+++ b/lib/librte_ether/rte_ether.h
@@ -49,6 +49,8 @@ extern "C" {

 #include 
 #include 
+#include 
+#include 

 #define ETHER_ADDR_LEN  6 /**< Length of Ethernet address. */
 #define ETHER_TYPE_LEN  2 /**< Length of Ethernet type field. */
@@ -332,6 +334,80 @@ struct vxlan_hdr {
 #define ETHER_VXLAN_HLEN (sizeof(struct udp_hdr) + sizeof(struct vxlan_hdr))
 /**< VXLAN tunnel header length. */

+/**
+ * Extract VLAN tag information into mbuf
+ *
+ * Software version of VLAN stripping
+ *
+ * @param m
+ *   The packet mbuf.
+ * @return
+ *   - 0: Success
+ *   - 1: not a vlan packet
+ */
+static inline int rte_vlan_strip(struct rte_mbuf *m)
+{
+   struct ether_hdr *eh
+= rte_pktmbuf_mtod(m, struct ether_hdr *);
+
+   if (eh->ether_type != ETHER_TYPE_VLAN)
+   return -1;
+
+   struct vlan_hdr *vh = (struct vlan_hdr *)(eh + 1);
+   m->ol_flags |= PKT_RX_VLAN_PKT;
+   m->vlan_tci = rte_be_to_cpu_16(vh->vlan_tci);
+
+   /* Copy ether header over rather than moving whole packet */
+   memmove(rte_pktmbuf_adj(m, sizeof(struct vlan_hdr)),
+   eh, 2 * ETHER_ADDR_LEN);
+
+   return 0;
+}
+
+/**
+ * Insert VLAN tag into mbuf.
+ *
+ * Software version of VLAN unstripping
+ *
+ * @param m
+ *   The packet mbuf.
+ * @return
+ *   - 0: On success
+ *   -EPERM: mbuf is is shared overwriting would be unsafe
+ *   -ENOSPC: not enough headroom in mbuf
+ */
+static inline int rte_vlan_insert(struct rte_mbuf **m)
+{
+   struct ether_hdr *oh, *nh;
+   struct vlan_hdr *vh;
+
+#ifdef RTE_MBUF_REFCNT
+   /* Can't insert header if mbuf is shared */
+   if (rte_mbuf_refcnt_read(*m) > 1) {
+   struct rte_mbuf *copy;
+
+   copy = rte_pktmbuf_clone(*m, (*m)->pool);
+   if (unlikely(copy == NULL))
+   return -ENOMEM;
+   rte_pktmbuf_free(*m);
+   *m = copy;
+   }
+#endif
+   oh = rte_pktmbuf_mtod(*m, struct ether_hdr *);
+   nh = (struct ether_hdr *)
+   rte_pktmbuf_prepend(*m, sizeof(struct vlan_hdr));
+   if (nh == NULL)
+   return -ENOSPC;
+
+   memmove(nh, oh, 2 * ETHER_ADDR_LEN);
+   nh->ether_type = ETHER_TYPE_VLAN;
+
+   vh = (struct vlan_hdr *) (nh + 1);
+   vh->vlan_tci = rte_cpu_to_be_16((*m)->vlan_tci);
+
+   return 0;
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
1.8.4.2

[dpdk-dev] [RFC PATCH 07/17] virtio: Remove unnecessary adapter structure

2014-12-08 Thread Ouyang Changchun

Cleanup virtio code by eliminating unnecessary nesting of
virtio hardware structure inside adapter structure.
Also allows removing unneeded macro, making code clearer.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 43 ---
 lib/librte_pmd_virtio/virtio_ethdev.h |  9 
 lib/librte_pmd_virtio/virtio_rxtx.c   |  3 +--
 3 files changed, 16 insertions(+), 39 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 829838c..c89614d 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -207,8 +207,7 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
 static int
 virtio_set_multiple_queues(struct rte_eth_dev *dev, uint16_t nb_queues)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -242,8 +241,7 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
const struct rte_memzone *mz;
uint16_t vq_size;
int size;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtqueue  *vq = NULL;

/* Write the virtqueue index to the Queue Select Field */
@@ -383,8 +381,7 @@ virtio_dev_cq_queue_setup(struct rte_eth_dev *dev, uint16_t 
vtpci_queue_idx,
struct virtqueue *vq;
uint16_t nb_desc = 0;
int ret;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;

PMD_INIT_FUNC_TRACE();
ret = virtio_dev_queue_setup(dev, VTNET_CQ, VTNET_SQ_CQ_QUEUE_IDX,
@@ -410,8 +407,7 @@ virtio_dev_close(struct rte_eth_dev *dev)
 static void
 virtio_dev_promiscuous_enable(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -430,8 +426,7 @@ virtio_dev_promiscuous_enable(struct rte_eth_dev *dev)
 static void
 virtio_dev_promiscuous_disable(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -450,8 +445,7 @@ virtio_dev_promiscuous_disable(struct rte_eth_dev *dev)
 static void
 virtio_dev_allmulticast_enable(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -470,8 +464,7 @@ virtio_dev_allmulticast_enable(struct rte_eth_dev *dev)
 static void
 virtio_dev_allmulticast_disable(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw
-   = VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
struct virtio_pmd_ctrl ctrl;
int dlen[1];
int ret;
@@ -853,8 +846,7 @@ virtio_interrupt_handler(__rte_unused struct 
rte_intr_handle *handle,
 void *param)
 {
struct rte_eth_dev *dev = param;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct virtio_hw *hw = dev->data->dev_private;
uint8_t isr;

/* Read interrupt status which clears interrupt */
@@ -880,12 +872,11 @@ static int
 eth_virtio_dev_init(__rte_unused struct eth_driver *eth_drv,
struct rte_eth_dev *eth_dev)
 {
+   struct virtio_hw *hw = eth_dev->data->dev_private;
struct virtio_net_config *config;
struct virtio_net_config local_config;
uint32_t offset_conf = sizeof(config->mac);
struct rte_pci_device *pci_dev;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);

if (RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr)) {
PMD_INIT_LOG(ERR,
@@ -1010,7 +1001,7 @@ static struct eth_driver rte_virtio_pmd = {
.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
},
.eth_dev_init = eth_virtio_dev_init,
-   .dev_private_size = sizeof(struct virtio_adapter),
+   .dev_private_size = sizeof(struct virtio_hw),
 };

 /*
@@ -1053,8 +1044,7 @@ static int
 virtio_dev_configure(struct rte_eth_dev *dev)
 {
const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
-   struct virtio_hw *hw =
-   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+

[dpdk-dev] [RFC PATCH 02/17] virtio: Use weaker barriers

2014-12-08 Thread Ouyang Changchun

The DPDK driver only has to deal with the case of running on PCI
and with SMP. In this case, the code can use the weaker barriers
instead of using hard (fence) barriers. This will help performance.
The rationale is explained in Linux kernel virtio_ring.h.

To make it clearer that this is a virtio thing and not some generic
barrier, prefix the barrier calls with virtio_.

Add missing (and needed) barrier between updating ring data
structure and notifying host.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_ethdev.c |  2 +-
 lib/librte_pmd_virtio/virtio_rxtx.c   |  8 +---
 lib/librte_pmd_virtio/virtqueue.h | 19 ++-
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 6c31598..78018f9 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -175,7 +175,7 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,
uint32_t idx, desc_idx, used_idx;
struct vring_used_elem *uep;

-   rmb();
+   virtio_rmb();

used_idx = (uint32_t)(vq->vq_used_cons_idx
& (vq->vq_nentries - 1));
diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index 3f6bad2..f878c62 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -456,7 +456,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)

nb_used = VIRTQUEUE_NUSED(rxvq);

-   rmb();
+   virtio_rmb();

num = (uint16_t)(likely(nb_used <= nb_pkts) ? nb_used : nb_pkts);
num = (uint16_t)(likely(num <= VIRTIO_MBUF_BURST_SZ) ? num : 
VIRTIO_MBUF_BURST_SZ);
@@ -516,6 +516,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
}

if (likely(nb_enqueued)) {
+   virtio_wmb();
if (unlikely(virtqueue_kick_prepare(rxvq))) {
virtqueue_notify(rxvq);
PMD_RX_LOG(DEBUG, "Notified\n");
@@ -547,7 +548,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,

nb_used = VIRTQUEUE_NUSED(rxvq);

-   rmb();
+   virtio_rmb();

if (nb_used == 0)
return 0;
@@ -694,7 +695,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
PMD_TX_LOG(DEBUG, "%d packets to xmit", nb_pkts);
nb_used = VIRTQUEUE_NUSED(txvq);

-   rmb();
+   virtio_rmb();

num = (uint16_t)(likely(nb_used < VIRTIO_MBUF_BURST_SZ) ? nb_used : 
VIRTIO_MBUF_BURST_SZ);

@@ -735,6 +736,7 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, 
uint16_t nb_pkts)
}
}
vq_update_avail_idx(txvq);
+   virtio_wmb();

txvq->packets += nb_tx;

diff --git a/lib/librte_pmd_virtio/virtqueue.h 
b/lib/librte_pmd_virtio/virtqueue.h
index fdee054..f6ad98d 100644
--- a/lib/librte_pmd_virtio/virtqueue.h
+++ b/lib/librte_pmd_virtio/virtqueue.h
@@ -46,9 +46,18 @@
 #include "virtio_ring.h"
 #include "virtio_logs.h"

-#define mb()  rte_mb()
-#define wmb() rte_wmb()
-#define rmb() rte_rmb()
+/*
+ * Per virtio_config.h in Linux.
+ * For virtio_pci on SMP, we don't need to order with respect to MMIO
+ * accesses through relaxed memory I/O windows, so smp_mb() et al are
+ * sufficient.
+ *
+ * This driver is for virtio_pci on SMP and therefore can assume
+ * weaker (compiler barriers)
+ */
+#define virtio_mb()rte_mb()
+#define virtio_rmb()   rte_compiler_barrier()
+#define virtio_wmb()   rte_compiler_barrier()

 #ifdef RTE_PMD_PACKET_PREFETCH
 #define rte_packet_prefetch(p)  rte_prefetch1(p)
@@ -225,7 +234,7 @@ virtqueue_full(const struct virtqueue *vq)
 static inline void
 vq_update_avail_idx(struct virtqueue *vq)
 {
-   rte_compiler_barrier();
+   virtio_rmb();
vq->vq_ring.avail->idx = vq->vq_avail_idx;
 }

@@ -255,7 +264,7 @@ static inline void
 virtqueue_notify(struct virtqueue *vq)
 {
/*
-* Ensure updated avail->idx is visible to host. mb() necessary?
+* Ensure updated avail->idx is visible to host.
 * For virtio on IA, the notificaiton is through io port operation
 * which is a serialization instruction itself.
 */
-- 
1.8.4.2

[dpdk-dev] [RFC PATCH 04/17] virtio: Add support for Link State interrupt

2014-12-08 Thread Ouyang Changchun

Virtio has link state interrupt which can be used.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 78 +++
 lib/librte_pmd_virtio/virtio_pci.c| 22 ++
 lib/librte_pmd_virtio/virtio_pci.h|  4 ++
 3 files changed, 86 insertions(+), 18 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 4bff0fe..d37f2e9 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -845,6 +845,34 @@ static int virtio_resource_init(struct rte_pci_device 
*pci_dev __rte_unused)
 #endif

 /*
+ * Process Virtio Config changed interrupt and call the callback
+ * if link state changed.
+ */
+static void
+virtio_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
+void *param)
+{
+   struct rte_eth_dev *dev = param;
+   struct virtio_hw *hw =
+   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   uint8_t isr;
+
+   /* Read interrupt status which clears interrupt */
+   isr = vtpci_isr(hw);
+   PMD_DRV_LOG(INFO, "interrupt status = %#x", isr);
+
+   if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0)
+   PMD_DRV_LOG(ERR, "interrupt enable failed");
+
+   if (isr & VIRTIO_PCI_ISR_CONFIG) {
+   if (virtio_dev_link_update(dev, 0) == 0)
+   _rte_eth_dev_callback_process(dev,
+ RTE_ETH_EVENT_INTR_LSC);
+   }
+
+}
+
+/*
  * This function is based on probe() function in virtio_pci.c
  * It returns 0 on success.
  */
@@ -968,6 +996,10 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
eth_dev->data->port_id, pci_dev->id.vendor_id,
pci_dev->id.device_id);
+
+   /* Setup interrupt callback  */
+   rte_intr_callback_register(&pci_dev->intr_handle,
+  virtio_interrupt_handler, eth_dev);
return 0;
 }

@@ -975,7 +1007,7 @@ static struct eth_driver rte_virtio_pmd = {
{
.name = "rte_virtio_pmd",
.id_table = pci_id_virtio_map,
-   .drv_flags = RTE_PCI_DRV_NEED_MAPPING,
+   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
},
.eth_dev_init = eth_virtio_dev_init,
.dev_private_size = sizeof(struct virtio_adapter),
@@ -1021,6 +1053,9 @@ static int
 virtio_dev_configure(struct rte_eth_dev *dev)
 {
const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
+   struct virtio_hw *hw =
+   VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   int ret;

PMD_INIT_LOG(DEBUG, "configure");

@@ -1029,7 +1064,11 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return (-EINVAL);
}

-   return 0;
+   ret = vtpci_irq_config(hw, 0);
+   if (ret != 0)
+   PMD_DRV_LOG(ERR, "failed to set config vector");
+
+   return ret;
 }


@@ -1037,7 +1076,6 @@ static int
 virtio_dev_start(struct rte_eth_dev *dev)
 {
uint16_t nb_queues, i;
-   uint16_t status;
struct virtio_hw *hw =
VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);

@@ -1052,18 +1090,22 @@ virtio_dev_start(struct rte_eth_dev *dev)
/* Do final configuration before rx/tx engine starts */
virtio_dev_rxtx_start(dev);

-   /* Check VIRTIO_NET_F_STATUS for link status*/
-   if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
-   vtpci_read_dev_config(hw,
-   offsetof(struct virtio_net_config, status),
-   &status, sizeof(status));
-   if ((status & VIRTIO_NET_S_LINK_UP) == 0)
-   PMD_INIT_LOG(ERR, "Port: %d Link is DOWN",
-dev->data->port_id);
-   else
-   PMD_INIT_LOG(DEBUG, "Port: %d Link is UP",
-dev->data->port_id);
+   /* check if lsc interrupt feature is enabled */
+   if (dev->data->dev_conf.intr_conf.lsc) {
+   if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
+   PMD_DRV_LOG(ERR, "link status not supported by host");
+   return -ENOTSUP;
+   }
+
+   if (rte_intr_enable(&dev->pci_dev->intr_handle) < 0) {
+   PMD_DRV_LOG(ERR, "interrupt enable failed");
+   return -EIO;
+   }
}
+
+   /* Initialize Link state */
+   virtio_dev_link_update(dev, 0);
+
vtpci_reinit_complete(hw);

/*Notify the backend
@@ -1145,6 +1187,7 @@ virtio_dev_stop(struct rte_eth_dev *dev)
VIRTIO_DEV_PRIVATE_TO_HW(dev->data->dev_private);

/* reset the

[dpdk-dev] [RFC PATCH 06/17] virtio: Use software vlan stripping

2014-12-08 Thread Ouyang Changchun

Implement VLAN stripping in software. This allows application
to be device independent.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_ether/rte_ethdev.h |  3 +++
 lib/librte_pmd_virtio/virtio_ethdev.c |  2 ++
 lib/librte_pmd_virtio/virtio_pci.h|  1 +
 lib/librte_pmd_virtio/virtio_rxtx.c   | 20 ++--
 4 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index f66805d..07d55b8 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -643,6 +643,9 @@ struct rte_eth_rxconf {
 #define ETH_TXQ_FLAGS_NOOFFLOADS \
(ETH_TXQ_FLAGS_NOVLANOFFL | ETH_TXQ_FLAGS_NOXSUMSCTP | \
 ETH_TXQ_FLAGS_NOXSUMUDP  | ETH_TXQ_FLAGS_NOXSUMTCP)
+#define ETH_TXQ_FLAGS_NOXSUMS \
+   (ETH_TXQ_FLAGS_NOXSUMSCTP | ETH_TXQ_FLAGS_NOXSUMUDP | \
+ETH_TXQ_FLAGS_NOXSUMTCP)
 /**
  * A structure used to configure a TX ring of an Ethernet port.
  */
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index d37f2e9..829838c 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -1064,6 +1064,8 @@ virtio_dev_configure(struct rte_eth_dev *dev)
return (-EINVAL);
}

+   hw->vlan_strip = rxmode->hw_vlan_strip;
+
ret = vtpci_irq_config(hw, 0);
if (ret != 0)
PMD_DRV_LOG(ERR, "failed to set config vector");
diff --git a/lib/librte_pmd_virtio/virtio_pci.h 
b/lib/librte_pmd_virtio/virtio_pci.h
index 6998737..6d93fac 100644
--- a/lib/librte_pmd_virtio/virtio_pci.h
+++ b/lib/librte_pmd_virtio/virtio_pci.h
@@ -168,6 +168,7 @@ struct virtio_hw {
uint32_tmax_tx_queues;
uint32_tmax_rx_queues;
uint16_tvtnet_hdr_size;
+   uint8_t vlan_strip;
uint8_t use_msix;
uint8_t mac_addr[ETHER_ADDR_LEN];
 };
diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index f878c62..a5756e1 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 

 #include "virtio_logs.h"
 #include "virtio_ethdev.h"
@@ -408,8 +409,8 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,

PMD_INIT_FUNC_TRACE();

-   if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOOFFLOADS)
-   != ETH_TXQ_FLAGS_NOOFFLOADS) {
+   if ((tx_conf->txq_flags & ETH_TXQ_FLAGS_NOXSUMS)
+   != ETH_TXQ_FLAGS_NOXSUMS) {
PMD_INIT_LOG(ERR, "TX checksum offload not supported\n");
return -EINVAL;
}
@@ -446,6 +447,7 @@ uint16_t
 virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 {
struct virtqueue *rxvq = rx_queue;
+   struct virtio_hw *hw = rxvq->hw;
struct rte_mbuf *rxm, *new_mbuf;
uint16_t nb_used, num, nb_rx = 0;
uint32_t len[VIRTIO_MBUF_BURST_SZ];
@@ -489,6 +491,9 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
rxm->pkt_len = (uint32_t)(len[i] - hdr_size);
rxm->data_len = (uint16_t)(len[i] - hdr_size);

+   if (hw->vlan_strip)
+   rte_vlan_strip(rxm);
+
VIRTIO_DUMP_PACKET(rxm, rxm->data_len);

rx_pkts[nb_rx++] = rxm;
@@ -717,6 +722,17 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
 */
if (likely(need <= 0)) {
txm = tx_pkts[nb_tx];
+
+   /* Do VLAN tag insertion */
+   if (txm->ol_flags & PKT_TX_VLAN_PKT) {
+   error = rte_vlan_insert(&txm);
+   if (unlikely(error)) {
+   rte_pktmbuf_free(txm);
+   ++nb_tx;
+   continue;
+   }
+   }
+
/* Enqueue Packet buffers */
error = virtqueue_enqueue_xmit(txvq, txm);
if (unlikely(error)) {
-- 
1.8.4.2

[dpdk-dev] [RFC PATCH 10/17] virtio: Make vtpci_get_status local

2014-12-08 Thread Ouyang Changchun

Make vtpci_get_status a local function as it is used in one file.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_pci.c | 4 +++-
 lib/librte_pmd_virtio/virtio_pci.h | 2 --
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_pci.c 
b/lib/librte_pmd_virtio/virtio_pci.c
index b099e4f..2245bec 100644
--- a/lib/librte_pmd_virtio/virtio_pci.c
+++ b/lib/librte_pmd_virtio/virtio_pci.c
@@ -35,6 +35,8 @@
 #include "virtio_pci.h"
 #include "virtio_logs.h"

+static uint8_t vtpci_get_status(struct virtio_hw *);
+
 void
 vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset,
void *dst, int length)
@@ -113,7 +115,7 @@ vtpci_reinit_complete(struct virtio_hw *hw)
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
 }

-uint8_t
+static uint8_t
 vtpci_get_status(struct virtio_hw *hw)
 {
return VIRTIO_READ_REG_1(hw, VIRTIO_PCI_STATUS);
diff --git a/lib/librte_pmd_virtio/virtio_pci.h 
b/lib/librte_pmd_virtio/virtio_pci.h
index 0a4b578..64d9c34 100644
--- a/lib/librte_pmd_virtio/virtio_pci.h
+++ b/lib/librte_pmd_virtio/virtio_pci.h
@@ -255,8 +255,6 @@ void vtpci_reset(struct virtio_hw *);

 void vtpci_reinit_complete(struct virtio_hw *);

-uint8_t vtpci_get_status(struct virtio_hw *);
-
 void vtpci_set_status(struct virtio_hw *, uint8_t);

 uint32_t vtpci_negotiate_features(struct virtio_hw *, uint32_t);
-- 
1.8.4.2

[dpdk-dev] [RFC PATCH 09/17] virtio: Fix how states are handled during initialization

2014-12-08 Thread Ouyang Changchun

Change order of initialiazation to match Linux kernel.
Don't blow away control queue by doing reset when stopped.

Calling dev_stop then dev_start would not work.
Dev_stop was calling virtio reset and that would clear all queues
and clear all feature negotiation.
Resolved by only doing reset on device removal.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 58 ---
 lib/librte_pmd_virtio/virtio_pci.c| 10 ++
 lib/librte_pmd_virtio/virtio_pci.h|  3 +-
 3 files changed, 37 insertions(+), 34 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index b7f65b9..a07f4ca 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -398,9 +398,14 @@ virtio_dev_cq_queue_setup(struct rte_eth_dev *dev, 
uint16_t vtpci_queue_idx,
 static void
 virtio_dev_close(struct rte_eth_dev *dev)
 {
+   struct virtio_hw *hw = dev->data->dev_private;
+
PMD_INIT_LOG(DEBUG, "virtio_dev_close");

-   virtio_dev_stop(dev);
+   /* reset the NIC */
+   vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR);
+   vtpci_reset(hw);
+   virtio_dev_free_mbufs(dev);
 }

 static void
@@ -889,6 +894,9 @@ eth_virtio_dev_init(__rte_unused struct eth_driver *eth_drv,
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
return 0;

+   /* Tell the host we've noticed this device. */
+   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
+
pci_dev = eth_dev->pci_dev;
if (virtio_resource_init(pci_dev) < 0)
return -1;
@@ -899,9 +907,6 @@ eth_virtio_dev_init(__rte_unused struct eth_driver *eth_drv,
/* Reset the device although not necessary at startup */
vtpci_reset(hw);

-   /* Tell the host we've noticed this device. */
-   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
-
/* Tell the host we've known how to drive the device. */
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
virtio_negotiate_features(hw);
@@ -990,6 +995,9 @@ eth_virtio_dev_init(__rte_unused struct eth_driver *eth_drv,
/* Setup interrupt callback  */
rte_intr_callback_register(&pci_dev->intr_handle,
   virtio_interrupt_handler, eth_dev);
+
+   virtio_dev_cq_start(eth_dev);
+
return 0;
 }

@@ -1044,7 +1052,6 @@ virtio_dev_configure(struct rte_eth_dev *dev)
 {
const struct rte_eth_rxmode *rxmode = &dev->data->dev_conf.rxmode;
struct virtio_hw *hw = dev->data->dev_private;
-   int ret;

PMD_INIT_LOG(DEBUG, "configure");

@@ -1055,11 +1062,12 @@ virtio_dev_configure(struct rte_eth_dev *dev)

hw->vlan_strip = rxmode->hw_vlan_strip;

-   ret = vtpci_irq_config(hw, 0);
-   if (ret != 0)
+   if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
PMD_DRV_LOG(ERR, "failed to set config vector");
+   return -EBUSY;
+   }

-   return ret;
+   return 0;
 }


@@ -1069,17 +1077,6 @@ virtio_dev_start(struct rte_eth_dev *dev)
uint16_t nb_queues, i;
struct virtio_hw *hw = dev->data->dev_private;

-   /* Tell the host we've noticed this device. */
-   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
-
-   /* Tell the host we've known how to drive the device. */
-   vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
-
-   virtio_dev_cq_start(dev);
-
-   /* Do final configuration before rx/tx engine starts */
-   virtio_dev_rxtx_start(dev);
-
/* check if lsc interrupt feature is enabled */
if (dev->data->dev_conf.intr_conf.lsc) {
if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) {
@@ -1096,8 +1093,16 @@ virtio_dev_start(struct rte_eth_dev *dev)
/* Initialize Link state */
virtio_dev_link_update(dev, 0);

+   /* On restart after stop do not touch queues */
+   if (hw->started)
+   return 0;
+
vtpci_reinit_complete(hw);

+   /* Do final configuration before rx/tx engine starts */
+   virtio_dev_rxtx_start(dev);
+   hw->started = 1;
+
/*Notify the backend
 *Otherwise the tap backend might already stop its queue due to 
fullness.
 *vhost backend will have no chance to be waked up
@@ -1168,17 +1173,20 @@ static void virtio_dev_free_mbufs(struct rte_eth_dev 
*dev)
 }

 /*
- * Stop device: disable rx and tx functions to allow for reconfiguring.
+ * Stop device: disable interrupt and mark link down
  */
 static void
 virtio_dev_stop(struct rte_eth_dev *dev)
 {
-   struct virtio_hw *hw = dev->data->dev_private;
+   struct rte_eth_link link;

-   /* reset the NIC */
-   vtpci_irq_config(hw, 0);
-   vtpci_reset(hw);
-   virtio_dev_free_mbufs(dev);
+   PMD_INIT_LOG(DEBUG, "stop");
+
+   if (dev->data->dev_conf.intr_conf.lsc)
+   rte_intr_disable(&dev->pci_dev-

[dpdk-dev] [RFC PATCH 08/17] virtio: Remove redundant vq_alignment

2014-12-08 Thread Ouyang Changchun

Since vq_alignment is constant (always 4K), it does not
need to be part of the vring struct.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 1 -
 lib/librte_pmd_virtio/virtio_rxtx.c   | 2 +-
 lib/librte_pmd_virtio/virtqueue.h | 3 +--
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index c89614d..b7f65b9 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -294,7 +294,6 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev,
vq->port_id = dev->data->port_id;
vq->queue_id = queue_idx;
vq->vq_queue_index = vtpci_queue_idx;
-   vq->vq_alignment = VIRTIO_PCI_VRING_ALIGN;
vq->vq_nentries = vq_size;
vq->vq_free_cnt = vq_size;

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index 73ad3ac..b44f091 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -258,7 +258,7 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type)
 * Reinitialise since virtio port might have been stopped and restarted
 */
memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
-   vring_init(vr, size, ring_mem, vq->vq_alignment);
+   vring_init(vr, size, ring_mem, VIRTIO_PCI_VRING_ALIGN);
vq->vq_used_cons_idx = 0;
vq->vq_desc_head_idx = 0;
vq->vq_avail_idx = 0;
diff --git a/lib/librte_pmd_virtio/virtqueue.h 
b/lib/librte_pmd_virtio/virtqueue.h
index f6ad98d..5b8a255 100644
--- a/lib/librte_pmd_virtio/virtqueue.h
+++ b/lib/librte_pmd_virtio/virtqueue.h
@@ -138,8 +138,7 @@ struct virtqueue {
uint8_t port_id;  /**< Device port identifier. */

void*vq_ring_virt_mem;/**< linear address of vring*/
-   int vq_alignment;
-   int vq_ring_size;
+   unsigned int vq_ring_size;
phys_addr_t vq_ring_mem;  /**< physical address of vring */

struct vring vq_ring;/**< vring keeping desc, used and avail */
-- 
1.8.4.2

[dpdk-dev] [RFC PATCH 12/17] virtio: Move allocation before initialization

2014-12-08 Thread Ouyang Changchun

If allocation fails, don't want to leave virtio device stuck
in middle of initialization sequence.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index c17cac8..13feda5 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -890,6 +890,15 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
return 0;

+   /* Allocate memory for storing MAC addresses */
+   eth_dev->data->mac_addrs = rte_zmalloc("virtio", ETHER_ADDR_LEN, 0);
+   if (eth_dev->data->mac_addrs == NULL) {
+   PMD_INIT_LOG(ERR,
+   "Failed to allocate %d bytes needed to store MAC 
addresses",
+   ETHER_ADDR_LEN);
+   return -ENOMEM;
+   }
+
/* Tell the host we've noticed this device. */
vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);

@@ -916,15 +925,6 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr);
}

-   /* Allocate memory for storing MAC addresses */
-   eth_dev->data->mac_addrs = rte_zmalloc("virtio", ETHER_ADDR_LEN, 0);
-   if (eth_dev->data->mac_addrs == NULL) {
-   PMD_INIT_LOG(ERR,
-   "Failed to allocate %d bytes needed to store MAC 
addresses",
-   ETHER_ADDR_LEN);
-   return -ENOMEM;
-   }
-
/* Copy the permanent MAC address to: virtio_hw */
virtio_get_hwaddr(hw);
ether_addr_copy((struct ether_addr *) hw->mac_addr,
-- 
1.8.4.2

[dpdk-dev] [RFC PATCH 03/17] virtio: Allow starting with link down

2014-12-08 Thread Ouyang Changchun

Starting driver with link down should be ok, it is with every
other driver. So just allow it.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 78018f9..4bff0fe 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -1057,14 +1057,12 @@ virtio_dev_start(struct rte_eth_dev *dev)
vtpci_read_dev_config(hw,
offsetof(struct virtio_net_config, status),
&status, sizeof(status));
-   if ((status & VIRTIO_NET_S_LINK_UP) == 0) {
+   if ((status & VIRTIO_NET_S_LINK_UP) == 0)
PMD_INIT_LOG(ERR, "Port: %d Link is DOWN",
 dev->data->port_id);
-   return -EIO;
-   } else {
+   else
PMD_INIT_LOG(DEBUG, "Port: %d Link is UP",
 dev->data->port_id);
-   }
}
vtpci_reinit_complete(hw);

-- 
1.8.4.2

[dpdk-dev] [RFC PATCH 11/17] virtio: Check for packet headroom at compile time

2014-12-08 Thread Ouyang Changchun

Better to check at compile time than fail at runtime.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index a07f4ca..c17cac8 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -882,11 +882,7 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,
uint32_t offset_conf = sizeof(config->mac);
struct rte_pci_device *pci_dev;

-   if (RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr)) {
-   PMD_INIT_LOG(ERR,
-   "MBUF HEADROOM should be enough to hold virtio net 
hdr\n");
-   return -1;
-   }
+   RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr));

eth_dev->dev_ops = &virtio_eth_dev_ops;
eth_dev->tx_pkt_burst = &virtio_xmit_pkts;
-- 
1.8.4.2

[dpdk-dev] [RFC PATCH 13/17] virtio: Add support for vlan filtering

2014-12-08 Thread Ouyang Changchun

Virtio supports vlan filtering.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 13feda5..ec5a51e 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -84,6 +84,8 @@ static void virtio_dev_tx_queue_release(__rte_unused void 
*txq);
 static void virtio_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats 
*stats);
 static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
+static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
+   uint16_t vlan_id, int on);

 static int virtio_dev_queue_stats_mapping_set(
__rte_unused struct rte_eth_dev *eth_dev,
@@ -511,6 +513,7 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.tx_queue_release= virtio_dev_tx_queue_release,
/* collect stats per queue */
.queue_stats_mapping_set = virtio_dev_queue_stats_mapping_set,
+   .vlan_filter_set = virtio_vlan_filter_set,
 };

 static inline int
@@ -640,14 +643,31 @@ virtio_get_hwaddr(struct virtio_hw *hw)
}
 }

+static int
+virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   struct virtio_pmd_ctrl ctrl;
+   int len;
+
+   if (!vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN))
+   return -ENOTSUP;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_VLAN;
+   ctrl.hdr.cmd = on ? VIRTIO_NET_CTRL_VLAN_ADD : VIRTIO_NET_CTRL_VLAN_DEL;
+   memcpy(ctrl.data, &vlan_id, sizeof(vlan_id));
+   len = sizeof(vlan_id);
+
+   return virtio_send_command(hw->cvq, &ctrl, &len, 1);
+}

 static void
 virtio_negotiate_features(struct virtio_hw *hw)
 {
uint32_t host_features, mask;

-   mask = VIRTIO_NET_F_CTRL_VLAN;
-   mask |= VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;
+   /* checksum offload not implemented */
+   mask = VIRTIO_NET_F_CSUM | VIRTIO_NET_F_GUEST_CSUM;

/* TSO and LRO are only available when their corresponding
 * checksum offload feature is also negotiated.
@@ -1058,6 +1078,13 @@ virtio_dev_configure(struct rte_eth_dev *dev)

hw->vlan_strip = rxmode->hw_vlan_strip;

+   if (rxmode->hw_vlan_filter
+   && !vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VLAN)) {
+   PMD_DRV_LOG(NOTICE,
+   "vlan filtering not available on this host");
+   return -ENOTSUP;
+   }
+
if (vtpci_irq_config(hw, 0) == VIRTIO_MSI_NO_VECTOR) {
PMD_DRV_LOG(ERR, "failed to set config vector");
return -EBUSY;
-- 
1.8.4.2

[dpdk-dev] [RFC PATCH 15/17] virtio: Add ability to set MAC address

2014-12-08 Thread Ouyang Changchun

Need to have do special things to set default mac address.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_ether/rte_ethdev.h |  5 +
 lib/librte_pmd_virtio/virtio_ethdev.c | 24 
 2 files changed, 29 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 07d55b8..cbe3fdf 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1249,6 +1249,10 @@ typedef void (*eth_mac_addr_add_t)(struct rte_eth_dev 
*dev,
  uint32_t vmdq);
 /**< @internal Set a MAC address into Receive Address Address Register */

+typedef void (*eth_mac_addr_set_t)(struct rte_eth_dev *dev,
+ struct ether_addr *mac_addr);
+/**< @internal Set a MAC address into Receive Address Address Register */
+
 typedef int (*eth_uc_hash_table_set_t)(struct rte_eth_dev *dev,
  struct ether_addr *mac_addr,
  uint8_t on);
@@ -1482,6 +1486,7 @@ struct eth_dev_ops {
priority_flow_ctrl_set_t   priority_flow_ctrl_set; /**< Setup priority 
flow control.*/
eth_mac_addr_remove_t  mac_addr_remove; /**< Remove MAC address */
eth_mac_addr_add_t mac_addr_add;  /**< Add a MAC address */
+   eth_mac_addr_set_t mac_addr_set;  /**< Set a MAC address */
eth_uc_hash_table_set_tuc_hash_table_set;  /**< Set Unicast Table 
Array */
eth_uc_all_hash_table_set_t uc_all_hash_table_set;  /**< Set Unicast 
hash bitmap */
eth_mirror_rule_set_t  mirror_rule_set;  /**< Add a traffic mirror 
rule.*/
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index e469ac2..c5f21c1 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -90,6 +90,8 @@ static void virtio_mac_addr_add(struct rte_eth_dev *dev,
struct ether_addr *mac_addr,
uint32_t index, uint32_t vmdq __rte_unused);
 static void virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);
+static void virtio_mac_addr_set(struct rte_eth_dev *dev,
+   struct ether_addr *mac_addr);

 static int virtio_dev_queue_stats_mapping_set(
__rte_unused struct rte_eth_dev *eth_dev,
@@ -518,6 +520,7 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.vlan_filter_set = virtio_vlan_filter_set,
.mac_addr_add= virtio_mac_addr_add,
.mac_addr_remove = virtio_mac_addr_remove,
+   .mac_addr_set= virtio_mac_addr_set,
 };

 static inline int
@@ -733,6 +736,27 @@ virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t 
index)
virtio_mac_table_set(hw, uc, mc);
 }

+static void
+virtio_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+
+   memcpy(hw->mac_addr, mac_addr, ETHER_ADDR_LEN);
+
+   /* Use atomic update if available */
+   if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+   struct virtio_pmd_ctrl ctrl;
+   int len = ETHER_ADDR_LEN;
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET;
+
+   memcpy(ctrl.data, mac_addr, ETHER_ADDR_LEN);
+   virtio_send_command(hw->cvq, &ctrl, &len, 1);
+   } else if (vtpci_with_feature(hw, VIRTIO_NET_F_MAC))
+   virtio_set_hwaddr(hw);
+}
+
 static int
 virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
 {
-- 
1.8.4.2

[dpdk-dev] [RFC PATCH 00/17] Single virtio implementation

2014-12-08 Thread Ouyang Changchun

This is RFC patch for single virtio implementation.

Why we need single virtio?

As we know currently there are at least 3 virtio PMD driver implementations:
A) lib/librte_pmd_virtio(refer as virtio A);
B) virtio_net_pmd by 6wind(refer as virtio B);
C) virtio by Brocade/vyatta(refer as virtio C);

Integrating 3 implementations into one could reduce the maintaining cost and 
time,
in other hand, user don't need practice their application on 3 variant one by 
one to see
which one is the best for them;


What's the status?

Currently virtio A has covered most features of virtio B, we could regard they 
have
similar behavior as virtio driver. But there are some differences between
virtio A and virtio C, so it need integrate features/codes from virtio C into 
virtio A.
This patch set bases on two original RFC patch sets from Stephen 
Hemminger[stephen at networkplumber.org]
Refer to [http://dpdk.org/ml/archives/dev/2014-August/004845.html ] for the 
original one.
This patch set also resolves some conflict with latest codes and removed 
duplicated codes.


What this patch set contains:
===
  1) virtio: Rearrange resource initialization, it extracts a function to setup 
PCI resources;
  2) virtio: Use weaker barriers, as DPDK driver only has to deal with the case 
of running on PCI
 and with SMP, In this case, the code can use the weaker barriers instead 
of using hard (fence)
 barriers. This may help performance a bit;
  3) virtio: Allow starting with link down, other driver has similar behavior;
  4) virtio: Add support for Link State interrupt;
  5) ether: Add soft vlan encap/decap functions, it helps if HW don't support 
vlan strip;
  6) virtio: Use software vlan stripping;
  7) virtio: Remove unnecessary adapter structure;
  8) virtio: Remove redundant vq_alignment, as vq alignment is always 4K, so 
use constant when needed;
  9) virtio: Fix how states are handled during initialization, this is to match 
Linux kernel;
  10) virtio: Make vtpci_get_status a local function as it is used in one file;
  11) virtio: Check for packet headroom at compile time;
  12) virtio: Move allocation before initialization to avoid being stuck in 
middle of virtio init;
  13) virtio: Add support for vlan filtering;
  14) virtio: Add support for multiple mac addresses;
  15) virtio: Add ability to set MAC address;
  16) virtio: Free mbuf's with threshold, this makes its behavior more like 
ixgbe;
  17) virtio: Use port IO to get PCI resource for security reasons and match 
virtio-net-pmd.

Any feedback and comments for this RFC are welcome.

Changchun Ouyang (17):
  virtio: Rearrange resource initialization
  virtio: Use weaker barriers
  virtio: Allow starting with link down
  virtio: Add support for Link State interrupt
  ether: Add soft vlan encap/decap functions
  virtio: Use software vlan stripping
  virtio: Remove unnecessary adapter structure
  virtio: Remove redundant vq_alignment
  virtio: Fix how states are handled during initialization
  virtio: Make vtpci_get_status local
  virtio: Check for packet headroom at compile time
  virtio: Move allocation before initialization
  virtio: Add support for vlan filtering
  virtio: Add suport for multiple mac addresses
  virtio: Add ability to set MAC address
  virtio: Free mbuf's with threshold
  virtio: Use port IO to get PCI resource.

 lib/librte_eal/common/include/rte_pci.h |   2 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |   3 +-
 lib/librte_ether/rte_ethdev.h   |   8 +
 lib/librte_ether/rte_ether.h|  76 +
 lib/librte_pmd_virtio/virtio_ethdev.c   | 479 
 lib/librte_pmd_virtio/virtio_ethdev.h   |  12 +-
 lib/librte_pmd_virtio/virtio_pci.c  |  20 +-
 lib/librte_pmd_virtio/virtio_pci.h  |   8 +-
 lib/librte_pmd_virtio/virtio_rxtx.c | 101 +--
 lib/librte_pmd_virtio/virtqueue.h   |  59 +++-
 10 files changed, 614 insertions(+), 154 deletions(-)

-- 
1.8.4.2

[dpdk-dev] [RFC PATCH 14/17] virtio: Add suport for multiple mac addresses

2014-12-08 Thread Ouyang Changchun

Virtio support multiple MAC addresses.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_ethdev.c | 94 ++-
 lib/librte_pmd_virtio/virtio_ethdev.h |  3 +-
 lib/librte_pmd_virtio/virtqueue.h | 34 -
 3 files changed, 127 insertions(+), 4 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index ec5a51e..e469ac2 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -86,6 +86,10 @@ static void virtio_dev_stats_reset(struct rte_eth_dev *dev);
 static void virtio_dev_free_mbufs(struct rte_eth_dev *dev);
 static int virtio_vlan_filter_set(struct rte_eth_dev *dev,
uint16_t vlan_id, int on);
+static void virtio_mac_addr_add(struct rte_eth_dev *dev,
+   struct ether_addr *mac_addr,
+   uint32_t index, uint32_t vmdq __rte_unused);
+static void virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index);

 static int virtio_dev_queue_stats_mapping_set(
__rte_unused struct rte_eth_dev *eth_dev,
@@ -503,8 +507,6 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
.stats_get   = virtio_dev_stats_get,
.stats_reset = virtio_dev_stats_reset,
.link_update = virtio_dev_link_update,
-   .mac_addr_add= NULL,
-   .mac_addr_remove = NULL,
.rx_queue_setup  = virtio_dev_rx_queue_setup,
/* meaningfull only to multiple queue */
.rx_queue_release= virtio_dev_rx_queue_release,
@@ -514,6 +516,8 @@ static struct eth_dev_ops virtio_eth_dev_ops = {
/* collect stats per queue */
.queue_stats_mapping_set = virtio_dev_queue_stats_mapping_set,
.vlan_filter_set = virtio_vlan_filter_set,
+   .mac_addr_add= virtio_mac_addr_add,
+   .mac_addr_remove = virtio_mac_addr_remove,
 };

 static inline int
@@ -644,6 +648,92 @@ virtio_get_hwaddr(struct virtio_hw *hw)
 }

 static int
+virtio_mac_table_set(struct virtio_hw *hw,
+const struct virtio_net_ctrl_mac *uc,
+const struct virtio_net_ctrl_mac *mc)
+{
+   struct virtio_pmd_ctrl ctrl;
+   int err, len[2];
+
+   ctrl.hdr.class = VIRTIO_NET_CTRL_MAC;
+   ctrl.hdr.cmd = VIRTIO_NET_CTRL_MAC_TABLE_SET;
+
+   len[0] = uc->entries * ETHER_ADDR_LEN + sizeof(uc->entries);
+   memcpy(ctrl.data, uc, len[0]);
+
+   len[1] = mc->entries * ETHER_ADDR_LEN + sizeof(mc->entries);
+   memcpy(ctrl.data + len[0], mc, len[1]);
+
+   err = virtio_send_command(hw->cvq, &ctrl, len, 2);
+   if (err != 0)
+   PMD_DRV_LOG(NOTICE, "mac table set failed: %d", err);
+
+   return err;
+}
+
+static void
+virtio_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+   uint32_t index, uint32_t vmdq __rte_unused)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   const struct ether_addr *addrs = dev->data->mac_addrs;
+   unsigned int i;
+   struct virtio_net_ctrl_mac *uc, *mc;
+
+   if (index >= VIRTIO_MAX_MAC_ADDRS) {
+   PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
+   return;
+   }
+
+   uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(uc->entries));
+   uc->entries = 0;
+   mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(mc->entries));
+   mc->entries = 0;
+
+   for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
+   const struct ether_addr *addr
+   = (i == index) ? mac_addr : addrs + i;
+   struct virtio_net_ctrl_mac *tbl
+   = is_multicast_ether_addr(addr) ? mc : uc;
+
+   memcpy(&tbl->macs[tbl->entries++], addr, ETHER_ADDR_LEN);
+   }
+
+   virtio_mac_table_set(hw, uc, mc);
+}
+
+static void
+virtio_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+   struct virtio_hw *hw = dev->data->dev_private;
+   struct ether_addr *addrs = dev->data->mac_addrs;
+   struct virtio_net_ctrl_mac *uc, *mc;
+   unsigned int i;
+
+   if (index >= VIRTIO_MAX_MAC_ADDRS) {
+   PMD_DRV_LOG(ERR, "mac address index %u out of range", index);
+   return;
+   }
+
+   uc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(uc->entries));
+   uc->entries = 0;
+   mc = alloca(VIRTIO_MAX_MAC_ADDRS * ETHER_ADDR_LEN + 
sizeof(mc->entries));
+   mc->entries = 0;
+
+   for (i = 0; i < VIRTIO_MAX_MAC_ADDRS; i++) {
+   struct virtio_net_ctrl_mac *tbl;
+
+   if (i == index || is_zero_ether_addr(addrs + i))
+   continue;
+
+   tbl = is_multicast_ether_addr(addrs + i) ? mc : uc;
+   memcpy(&tbl->macs[tbl->entries++], addrs + i, ETHER_

[dpdk-dev] [RFC PATCH 17/17] virtio: Use port IO to get PCI resource.

2014-12-08 Thread Ouyang Changchun

Make virtio not require UIO for some security reasons, this is to match 6Wind's 
virtio-net-pmd.

Signed-off-by: Changchun Ouyang 
---
 lib/librte_eal/common/include/rte_pci.h |  2 +
 lib/librte_eal/linuxapp/eal/eal_pci.c   |  3 +-
 lib/librte_pmd_virtio/virtio_ethdev.c   | 75 -
 3 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_pci.h 
b/lib/librte_eal/common/include/rte_pci.h
index 66ed793..2021b3b 100644
--- a/lib/librte_eal/common/include/rte_pci.h
+++ b/lib/librte_eal/common/include/rte_pci.h
@@ -193,6 +193,8 @@ struct rte_pci_driver {

 /** Device needs PCI BAR mapping (done with either IGB_UIO or VFIO) */
 #define RTE_PCI_DRV_NEED_MAPPING 0x0001
+/** Device needs port IO(done with /proc/ioports) */
+#define RTE_PCI_DRV_IO_PORT 0x0002
 /** Device driver must be registered several times until failure - deprecated 
*/
 #pragma GCC poison RTE_PCI_DRV_MULTIPLE
 /** Device needs to be unbound even if no module is provided */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c 
b/lib/librte_eal/linuxapp/eal/eal_pci.c
index b5f5410..dd60793 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -573,7 +573,8 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, 
struct rte_pci_device *d
 #endif
/* map resources for devices that use igb_uio */
ret = pci_map_device(dev);
-   if (ret != 0)
+   if ((ret != 0) &&
+   ((dr->drv_flags & RTE_PCI_DRV_IO_PORT) == 0))
return ret;
} else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND &&
   rte_eal_process_type() == RTE_PROC_PRIMARY) {
diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index 1ec29e1..4490a06 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -961,6 +961,69 @@ static int virtio_resource_init(struct rte_pci_device 
*pci_dev)
 start, size);
return 0;
 }
+
+/* Extract I/O port numbers from proc/ioports */
+static int virtio_resource_init_by_ioport(struct rte_pci_device *pci_dev)
+{
+   uint16_t start, end;
+   int size;
+   FILE* fp;
+   char* line = NULL;
+   char pci_id[16];
+   int found = 0;
+   size_t linesz;
+
+   snprintf(pci_id, sizeof(pci_id), PCI_PRI_FMT,
+pci_dev->addr.domain,
+pci_dev->addr.bus,
+pci_dev->addr.devid,
+pci_dev->addr.function);
+
+   fp = fopen("/proc/ioports", "r");
+   if (fp == NULL) {
+   PMD_INIT_LOG(ERR, "%s(): can't open ioports", __func__);
+   return -1;
+   }
+
+   while (getdelim(&line, &linesz, '\n', fp) > 0) {
+   char* ptr = line;
+   char* left;
+   int n;
+
+   n = strcspn(ptr, ":");
+   ptr[n]= 0;
+   left = &ptr[n+1];
+
+   while (*left && isspace(*left))
+   left++;
+
+   if (!strncmp(left, pci_id, strlen(pci_id))) {
+   found = 1;
+
+   while (*ptr && isspace(*ptr))
+   ptr++;
+
+   sscanf(ptr, "%04hx-%04hx", &start, &end);
+   size = end - start + 1;
+
+   break;
+   }
+   }
+
+   free(line);
+   fclose(fp);
+
+   if (!found)
+   return -1;
+
+   pci_dev->mem_resource[0].addr = (void *)(uintptr_t)(uint32_t)start;
+   pci_dev->mem_resource[0].len =  (uint64_t)size;
+   PMD_INIT_LOG(DEBUG,
+"PCI Port IO found start=0x%lx with size=0x%lx",
+start, size);
+   return 0;
+}
+
 #else
 static int
 virtio_has_msix(const struct rte_pci_addr *loc __rte_unused)
@@ -974,6 +1037,12 @@ static int virtio_resource_init(struct rte_pci_device 
*pci_dev __rte_unused)
/* no setup required */
return 0;
 }
+
+static int virtio_resource_init_by_ioport(struct rte_pci_device *pci_dev)
+{
+   /* no setup required */
+   return 0;
+}
 #endif

 /*
@@ -1039,7 +1108,8 @@ eth_virtio_dev_init(__rte_unused struct eth_driver 
*eth_drv,

pci_dev = eth_dev->pci_dev;
if (virtio_resource_init(pci_dev) < 0)
-   return -1;
+   if (virtio_resource_init_by_ioport(pci_dev) < 0)
+   return -1;

hw->use_msix = virtio_has_msix(&pci_dev->addr);
hw->io_base = (uint32_t)(uintptr_t)pci_dev->mem_resource[0].addr;
@@ -1136,7 +1206,8 @@ static struct eth_driver rte_virtio_pmd = {
{
.name = "rte_virtio_pmd",
.id_table = pci_id_virtio_map,
-   .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+   .drv_flag

[dpdk-dev] [RFC PATCH 16/17] virtio: Free mbuf's with threshold

2014-12-08 Thread Ouyang Changchun

This makes virtio driver work like ixgbe. Transmit buffers are
held until a transmit threshold is reached. The previous behavior
was to hold mbuf's until the ring entry was reused which caused
more memory usage than needed.

Signed-off-by: Changchun Ouyang 
Signed-off-by: Stephen Hemminger 
---
 lib/librte_pmd_virtio/virtio_ethdev.c |  7 ++--
 lib/librte_pmd_virtio/virtio_rxtx.c   | 70 +--
 lib/librte_pmd_virtio/virtqueue.h |  3 +-
 3 files changed, 64 insertions(+), 16 deletions(-)

diff --git a/lib/librte_pmd_virtio/virtio_ethdev.c 
b/lib/librte_pmd_virtio/virtio_ethdev.c
index c5f21c1..1ec29e1 100644
--- a/lib/librte_pmd_virtio/virtio_ethdev.c
+++ b/lib/librte_pmd_virtio/virtio_ethdev.c
@@ -176,15 +176,16 @@ virtio_send_command(struct virtqueue *vq, struct 
virtio_pmd_ctrl *ctrl,

virtqueue_notify(vq);

-   while (vq->vq_used_cons_idx == vq->vq_ring.used->idx)
+   rte_rmb();
+   while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) {
+   rte_rmb();
usleep(100);
+   }

while (vq->vq_used_cons_idx != vq->vq_ring.used->idx) {
uint32_t idx, desc_idx, used_idx;
struct vring_used_elem *uep;

-   virtio_rmb();
-
used_idx = (uint32_t)(vq->vq_used_cons_idx
& (vq->vq_nentries - 1));
uep = &vq->vq_ring.used->ring[used_idx];
diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index b44f091..26c0a1d 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -129,9 +129,15 @@ virtqueue_dequeue_burst_rx(struct virtqueue *vq, struct 
rte_mbuf **rx_pkts,
return i;
 }

+#ifndef DEFAULT_TX_FREE_THRESH
+#define DEFAULT_TX_FREE_THRESH 32
+#endif
+
+/* Cleanup from completed transmits. */
 static void
-virtqueue_dequeue_pkt_tx(struct virtqueue *vq)
+virtio_xmit_cleanup(struct virtqueue *vq, uint16_t num)
 {
+#if 0
struct vring_used_elem *uep;
uint16_t used_idx, desc_idx;

@@ -140,6 +146,25 @@ virtqueue_dequeue_pkt_tx(struct virtqueue *vq)
desc_idx = (uint16_t) uep->id;
vq->vq_used_cons_idx++;
vq_ring_free_chain(vq, desc_idx);
+#endif
+   uint16_t i, used_idx, desc_idx;
+   for (i = 0; i < num ; i++) {
+   struct vring_used_elem *uep;
+   struct vq_desc_extra *dxp;
+
+   used_idx = (uint16_t)(vq->vq_used_cons_idx & (vq->vq_nentries - 
1));
+   uep = &vq->vq_ring.used->ring[used_idx];
+   dxp = &vq->vq_descx[used_idx];
+
+   desc_idx = (uint16_t) uep->id;
+   vq->vq_used_cons_idx++;
+   vq_ring_free_chain(vq, desc_idx);
+
+   if (dxp->cookie != NULL) {
+   rte_pktmbuf_free(dxp->cookie);
+   dxp->cookie = NULL;
+   }
+   }
 }


@@ -203,8 +228,10 @@ virtqueue_enqueue_xmit(struct virtqueue *txvq, struct 
rte_mbuf *cookie)

idx = head_idx;
dxp = &txvq->vq_descx[idx];
+#if 0
if (dxp->cookie != NULL)
rte_pktmbuf_free(dxp->cookie);
+#endif
dxp->cookie = (void *)cookie;
dxp->ndescs = needed;

@@ -404,6 +431,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
 {
uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX;
struct virtqueue *vq;
+   uint16_t tx_free_thresh;
int ret;

PMD_INIT_FUNC_TRACE();
@@ -421,6 +449,21 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev,
return ret;
}

+   tx_free_thresh = tx_conf->tx_free_thresh;
+   if (tx_free_thresh == 0)
+   tx_free_thresh = RTE_MIN(vq->vq_nentries / 4, 
DEFAULT_TX_FREE_THRESH);
+
+   if (tx_free_thresh >= (vq->vq_nentries - 3)) {
+   RTE_LOG(ERR, PMD, "tx_free_thresh must be less than the "
+   "number of TX entries minus 3 (%u)."
+   " (tx_free_thresh=%u port=%u queue=%u)\n",
+   vq->vq_nentries - 3,
+   tx_free_thresh, dev->data->port_id, queue_idx);
+   return -EINVAL;
+   }
+
+   vq->vq_free_thresh = tx_free_thresh;
+
dev->data->tx_queues[queue_idx] = vq;
return 0;
 }
@@ -688,11 +731,9 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
 {
struct virtqueue *txvq = tx_queue;
struct rte_mbuf *txm;
-   uint16_t nb_used, nb_tx, num;
+   uint16_t nb_used, nb_tx;
int error;

-   nb_tx = 0;
-
if (unlikely(nb_pkts < 1))
return nb_pkts;

@@ -700,21 +741,26 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf 
**tx_pkts, uint16_t nb_pkts)
nb_used = VIRTQUEUE_NUSED(txvq);

virtio_rmb();
+   if (likely(nb_used > txvq->vq_free_thresh))
+   virtio_xmit_cleanup(txvq, nb_used);

-   num = (uint16_t)(likely(nb

[dpdk-dev] [RFC PATCH 00/17] Single virtio implementation

2014-12-09 Thread Ouyang, Changchun

Hi Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Monday, December 8, 2014 5:31 PM
> To: Ouyang, Changchun
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 00/17] Single virtio implementation
> 
> Hi Changchun,
> 
> 2014-12-08 14:21, Ouyang Changchun:
> > This patch set bases on two original RFC patch sets from Stephen
> Hemminger[stephen at networkplumber.org]
> > Refer to [http://dpdk.org/ml/archives/dev/2014-August/004845.html ] for
> the original one.
> > This patch set also resolves some conflict with latest codes and removed
> duplicated codes.
> 
> As you sent the patches, you appear as the author.
> But I guess Stephen should be the author for some of them.
> Please check who has contributed the most in each patch to decide.

You are right, most of patches originate from Stephen's patchset, except for 
the last one,
To be honest, I am ok whoever is the author of this patch set, :-),
We could co-own the feature of Single virtio if you all agree with it, and I 
think we couldn't finish
Such a feature without collaboration among us, this is why I tried to 
communicate with most of you 
to collect more feedback, suggestion and comments for this feature.
Very appreciate for all kinds of feedback, suggestion here, especially for 
patch set from Stephen. 

According to your request, how could we make this patch set looks more like 
Stephen as the author? 
Currently I add Stephen as Signed-off-by list in each patch(I got the agreement 
from Stephen before doing this :-)).
Need I send all patchset to Stephen and let Stephen send out them to dpdk.org?
Or any other better solution?
If you has better suggestion, I assume it works for all subsequent RFC and 
normal patch set.

Any other suggestions are welcome.

Thanks
Changchun

1 2 3 4 5 6 7 8 >

1 - 100 of 741 matches

Mail list logo