[dpdk-dev] DEV@DPDK.ORG
[dpdk-dev] [PATCH v6 00/13] vhost-user multiple queues enabling
This patch set enables vhost-user multiple queues. Overview It depends on some QEMU patches that has already been merged to upstream. Those qemu patches introduce some new vhost-user messages, for vhost-user mq enabling negotiation. Here is the main negotiation steps (Qemu as master, and DPDK vhost-user as slave): - Master queries features by VHOST_USER_GET_FEATURES from slave - Check if VHOST_USER_F_PROTOCOL_FEATURES exist. If not, mq is not supported. (check patch 1 for why VHOST_USER_F_PROTOCOL_FEATURES is introduced) - Master then sends another command, VHOST_USER_GET_QUEUE_NUM, for querying how many queues the slave supports. Master will compare the result with the requested queue number. Qemu exits if the former is smaller. - Master then tries to initiate all queue pairs by sending some vhost user commands, including VHOST_USER_SET_VRING_CALL, which will trigger the slave to do related vring setup, such as vring allocation. Till now, all necessary initiation and negotiation are done. And master could send another message, VHOST_USER_SET_VRING_ENABLE, to enable/disable a specific queue dynamically later. Patchset Patch 1-6 are all prepare works for enabling mq; they are all atomic changes, with "do not breaking anything" beared in mind while making them. Patch 7 acutally enables mq feature, by setting two key feature flags. Patch 8 handles VHOST_USER_SET_VRING_ENABLE message, which is for enabling disabling a specific virt queue pair, and there is only one queue pair is enabled by default. Patch 9-12 is for demostrating the mq feature. Patch 13 udpates the doc release note. Testing === Host side -- - # Start vhost-switch sudo mount -t hugetlbfs nodev /mnt/huge sudo modprobe uio sudo insmod $RTE_SDK/$RTE_TARGET/kmod/igb_uio.ko sudo $RTE_SDK/tools/dpdk_nic_bind.py --bind igb_uio :08:00.0 sudo $RTE_SDK/examples/vhost/build/vhost-switch -c 0xf0 -n 4 \ --huge-dir /mnt/huge --socket-mem 2048,0 -- -p 1 --vm2vm 0 \ --dev-basename usvhost --rxq 2 # Above common generates a usvhost socket file at PWD. You could also # specify "--stats 1" option to enable stats dumping. - # start qemu sudo sudo mount -t hugetlbfs nodev $HOME/hugetlbfs $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -m 4G \ -object memory-backend-file,id=mem,size=4G,mem-path=$HOME/hugetlbfs,share=on \ -numa node,memdev=mem -chardev socket,id=chr0,path=/path/to/usvhost \ -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=2 \ -device virtio-net-pci,netdev=net0,mq=on,vectors=6,mac=52:54:00:12:34:58,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off \ -hda $HOME/iso/fc-22-x86_64.img -smp 10 -cpu core2duo,+sse3,+sse4.1,+sse4.2 Guest side -- modprobe uio insmod $RTE_SDK/$RTE_TARGET/kmod/igb_uio.ko echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages ./tools/dpdk_nic_bind.py --bind igb_uio 00:03.0 $RTE_SDK/$RTE_TARGET/app/testpmd -c 1f -n 4 -- --rxq=2 --txq=2 \ --nb-cores=4 -i --disable-hw-vlan --txqflags 0xf00 > set fwd mac > start tx_first After those setups, you then could use packet generator for packet tx/rx testing. Test with OVS = Marcel also created a simple yet quite clear test guide with OVS at: http://wiki.qemu.org/Features/vhost-user-ovs-dpdk BTW, Marcel, would you please complete the page on mq testing? --- Changchun Ouyang (7): vhost: rxtx: prepare work for multiple queue support virtio: read virtio_net_config correctly vhost: add VHOST_USER_SET_VRING_ENABLE message vhost: add API bind a virtq to a specific core ixgbe: support VMDq RSS in non-SRIOV environment examples/vhost: demonstrate the usage of vhost mq feature examples/vhost: add per queue stats Yuanhan Liu (6): vhost-user: add protocol features support vhost-user: add VHOST_USER_GET_QUEUE_NUM message vhost: vring queue setup for multiple queue support vhost-user: handle VHOST_USER_RESET_OWNER correctly vhost-user: enable vhost-user multiple queue doc: update release note for vhost-user mq support doc/guides/rel_notes/release_2_2.rst | 5 + drivers/net/ixgbe/ixgbe_rxtx.c| 86 +- drivers/net/virtio/virtio_ethdev.c| 16 +- examples/vhost/main.c | 420 +- examples/vhost/main.h | 3 +- lib/librte_ether/rte_ethdev.c | 11 + lib/librte_vhost/rte_vhost_version.map| 7 + lib/librte_vhost/rte_virtio_net.h | 38 ++- lib/librte_vhost/vhost_rxtx.c | 56 +++- lib/librte_vhost/vhost_user/vhost-net-user.c | 27 +- lib/librte_vhost/vhost_user/vhost-net-user.h | 4 + lib/librte_vhost/vhost_user/virtio-net-user.c | 83 +++-- lib/librte_vhost/vhost_user/virtio-net-user.h | 10 + lib/librte_vhost/virtio-net
[dpdk-dev] [PATCH v6 01/13] vhost-user: add protocol features support
The two protocol features messages are introduced by qemu vhost maintainer(Michael) for extendting vhost-user interface. Here is an excerpta from the vhost-user spec: Any protocol extensions are gated by protocol feature bits, which allows full backwards compatibility on both master and slave. The vhost-user multiple queue features will be treated as a vhost-user extension, hence, we have to implement the two messages first. VHOST_USER_PROTOCOL_FEATURES is initated to 0, as we don't support any yet. Signed-off-by: Yuanhan Liu --- lib/librte_vhost/rte_virtio_net.h | 1 + lib/librte_vhost/vhost_user/vhost-net-user.c | 13 - lib/librte_vhost/vhost_user/vhost-net-user.h | 2 ++ lib/librte_vhost/vhost_user/virtio-net-user.c | 13 + lib/librte_vhost/vhost_user/virtio-net-user.h | 5 + lib/librte_vhost/virtio-net.c | 5 - 6 files changed, 37 insertions(+), 2 deletions(-) diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index a037c15..e3a21e5 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -99,6 +99,7 @@ struct virtio_net { struct vhost_virtqueue *virtqueue[VIRTIO_QNUM];/**< Contains all virtqueue information. */ struct virtio_memory*mem; /**< QEMU memory and memory region information. */ uint64_tfeatures; /**< Negotiated feature set. */ + uint64_tprotocol_features; /**< Negotiated protocol feature set. */ uint64_tdevice_fh; /**< device identifier. */ uint32_tflags; /**< Device flags. Only used to check if device is running on data core. */ #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c index d1f8877..bc2ad24 100644 --- a/lib/librte_vhost/vhost_user/vhost-net-user.c +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c @@ -95,7 +95,9 @@ static const char *vhost_message_str[VHOST_USER_MAX] = { [VHOST_USER_GET_VRING_BASE] = "VHOST_USER_GET_VRING_BASE", [VHOST_USER_SET_VRING_KICK] = "VHOST_USER_SET_VRING_KICK", [VHOST_USER_SET_VRING_CALL] = "VHOST_USER_SET_VRING_CALL", - [VHOST_USER_SET_VRING_ERR] = "VHOST_USER_SET_VRING_ERR" + [VHOST_USER_SET_VRING_ERR] = "VHOST_USER_SET_VRING_ERR", + [VHOST_USER_GET_PROTOCOL_FEATURES] = "VHOST_USER_GET_PROTOCOL_FEATURES", + [VHOST_USER_SET_PROTOCOL_FEATURES] = "VHOST_USER_SET_PROTOCOL_FEATURES", }; /** @@ -363,6 +365,15 @@ vserver_message_handler(int connfd, void *dat, int *remove) ops->set_features(ctx, &features); break; + case VHOST_USER_GET_PROTOCOL_FEATURES: + msg.payload.u64 = VHOST_USER_PROTOCOL_FEATURES; + msg.size = sizeof(msg.payload.u64); + send_vhost_message(connfd, &msg); + break; + case VHOST_USER_SET_PROTOCOL_FEATURES: + user_set_protocol_features(ctx, msg.payload.u64); + break; + case VHOST_USER_SET_OWNER: ops->set_owner(ctx); break; diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-net-user.h index 2e72f3c..4490d23 100644 --- a/lib/librte_vhost/vhost_user/vhost-net-user.h +++ b/lib/librte_vhost/vhost_user/vhost-net-user.h @@ -63,6 +63,8 @@ typedef enum VhostUserRequest { VHOST_USER_SET_VRING_KICK = 12, VHOST_USER_SET_VRING_CALL = 13, VHOST_USER_SET_VRING_ERR = 14, + VHOST_USER_GET_PROTOCOL_FEATURES = 15, + VHOST_USER_SET_PROTOCOL_FEATURES = 16, VHOST_USER_MAX } VhostUserRequest; diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c index 4689927..360254e 100644 --- a/lib/librte_vhost/vhost_user/virtio-net-user.c +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c @@ -316,3 +316,16 @@ user_destroy_device(struct vhost_device_ctx ctx) dev->mem = NULL; } } + +void +user_set_protocol_features(struct vhost_device_ctx ctx, + uint64_t protocol_features) +{ + struct virtio_net *dev; + + dev = get_device(ctx); + if (dev == NULL || protocol_features & ~VHOST_USER_PROTOCOL_FEATURES) + return; + + dev->protocol_features = protocol_features; +} diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h b/lib/librte_vhost/vhost_user/virtio-net-user.h index df24860..e7a6ff4 100644 --- a/lib/librte_vhost/vhost_user/virtio-net-user.h +++ b/lib/librte_vhost/vhost_user/virtio-net-user.h @@ -37,12 +37,17 @@ #include "vhost-net.h" #include "vhost-net-user.h" +#define VHOST_USER_PROTOCOL_FEATURES 0ULL + int user_set_mem_table(struct vhost_device_ctx, struct VhostUserMsg *); void user_set_vring
[dpdk-dev] [PATCH v6 02/13] vhost-user: add VHOST_USER_GET_QUEUE_NUM message
To tell the frontend (qemu) how many queue pairs we support. And it is initiated to VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX. Signed-off-by: Yuanhan Liu --- lib/librte_vhost/vhost_user/vhost-net-user.c | 7 +++ lib/librte_vhost/vhost_user/vhost-net-user.h | 1 + 2 files changed, 8 insertions(+) diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c index bc2ad24..8675cd4 100644 --- a/lib/librte_vhost/vhost_user/vhost-net-user.c +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c @@ -98,6 +98,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = { [VHOST_USER_SET_VRING_ERR] = "VHOST_USER_SET_VRING_ERR", [VHOST_USER_GET_PROTOCOL_FEATURES] = "VHOST_USER_GET_PROTOCOL_FEATURES", [VHOST_USER_SET_PROTOCOL_FEATURES] = "VHOST_USER_SET_PROTOCOL_FEATURES", + [VHOST_USER_GET_QUEUE_NUM] = "VHOST_USER_GET_QUEUE_NUM", }; /** @@ -421,6 +422,12 @@ vserver_message_handler(int connfd, void *dat, int *remove) RTE_LOG(INFO, VHOST_CONFIG, "not implemented\n"); break; + case VHOST_USER_GET_QUEUE_NUM: + msg.payload.u64 = VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX; + msg.size = sizeof(msg.payload.u64); + send_vhost_message(connfd, &msg); + break; + default: break; diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-net-user.h index 4490d23..389d21d 100644 --- a/lib/librte_vhost/vhost_user/vhost-net-user.h +++ b/lib/librte_vhost/vhost_user/vhost-net-user.h @@ -65,6 +65,7 @@ typedef enum VhostUserRequest { VHOST_USER_SET_VRING_ERR = 14, VHOST_USER_GET_PROTOCOL_FEATURES = 15, VHOST_USER_SET_PROTOCOL_FEATURES = 16, + VHOST_USER_GET_QUEUE_NUM = 17, VHOST_USER_MAX } VhostUserRequest; -- 1.9.0
[dpdk-dev] [PATCH v6 03/13] vhost: vring queue setup for multiple queue support
All queue pairs, including the default (the first) queue pair, are allocated dynamically, when a vring_call message is received first time for a specific queue pair. This is a refactor work for enabling vhost-user multiple queue; it should not break anything as it does no functional changes: we don't support mq set, so there is only one mq at max. This patch is based on Changchun's patch. Signed-off-by: Yuanhan Liu --- v6: set vq->vhost_hlen correctly. --- lib/librte_vhost/rte_virtio_net.h | 3 +- lib/librte_vhost/vhost_user/virtio-net-user.c | 44 lib/librte_vhost/virtio-net.c | 144 -- 3 files changed, 114 insertions(+), 77 deletions(-) diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index e3a21e5..5dd6493 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -96,7 +96,7 @@ struct vhost_virtqueue { * Device structure contains all configuration information relating to the device. */ struct virtio_net { - struct vhost_virtqueue *virtqueue[VIRTIO_QNUM];/**< Contains all virtqueue information. */ + struct vhost_virtqueue *virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX]; /**< Contains all virtqueue information. */ struct virtio_memory*mem; /**< QEMU memory and memory region information. */ uint64_tfeatures; /**< Negotiated feature set. */ uint64_tprotocol_features; /**< Negotiated protocol feature set. */ @@ -104,6 +104,7 @@ struct virtio_net { uint32_tflags; /**< Device flags. Only used to check if device is running on data core. */ #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) charifname[IF_NAME_SZ]; /**< Name of the tap device or socket path. */ + uint32_tvirt_qp_nb; /**< number of queue pair we have allocated */ void*priv; /**< private context */ } __rte_cache_aligned; diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c index 360254e..e83d279 100644 --- a/lib/librte_vhost/vhost_user/virtio-net-user.c +++ b/lib/librte_vhost/vhost_user/virtio-net-user.c @@ -206,25 +206,33 @@ err_mmap: } static int +vq_is_ready(struct vhost_virtqueue *vq) +{ + return vq && vq->desc && + vq->kickfd != -1 && + vq->callfd != -1; +} + +static int virtio_is_ready(struct virtio_net *dev) { struct vhost_virtqueue *rvq, *tvq; + uint32_t i; - /* mq support in future.*/ - rvq = dev->virtqueue[VIRTIO_RXQ]; - tvq = dev->virtqueue[VIRTIO_TXQ]; - if (rvq && tvq && rvq->desc && tvq->desc && - (rvq->kickfd != -1) && - (rvq->callfd != -1) && - (tvq->kickfd != -1) && - (tvq->callfd != -1)) { - RTE_LOG(INFO, VHOST_CONFIG, - "virtio is now ready for processing.\n"); - return 1; + for (i = 0; i < dev->virt_qp_nb; i++) { + rvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ]; + tvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ]; + + if (!vq_is_ready(rvq) || !vq_is_ready(tvq)) { + RTE_LOG(INFO, VHOST_CONFIG, + "virtio is not ready for processing.\n"); + return 0; + } } + RTE_LOG(INFO, VHOST_CONFIG, - "virtio isn't ready for processing.\n"); - return 0; + "virtio is now ready for processing.\n"); + return 1; } void @@ -290,13 +298,9 @@ user_get_vring_base(struct vhost_device_ctx ctx, * sent and only sent in vhost_vring_stop. * TODO: cleanup the vring, it isn't usable since here. */ - if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) { - close(dev->virtqueue[VIRTIO_RXQ]->kickfd); - dev->virtqueue[VIRTIO_RXQ]->kickfd = -1; - } - if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) { - close(dev->virtqueue[VIRTIO_TXQ]->kickfd); - dev->virtqueue[VIRTIO_TXQ]->kickfd = -1; + if ((dev->virtqueue[state->index]->kickfd) >= 0) { + close(dev->virtqueue[state->index]->kickfd); + dev->virtqueue[state->index]->kickfd = -1; } return 0; diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c index deac6b9..57fb7b1 100644 --- a/lib/librte_vhost/virtio-net.c +++ b/lib/librte_vhost/virtio-net.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #ifdef RTE_LIBRTE_VHOST_NUMA @@ -178,6 +179,15 @@ add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev) } +static void +cleanup_vq(struct vhost_virtqueue *vq) +{ + if (vq->callfd >= 0) +
[dpdk-dev] [PATCH v6 04/13] vhost: rxtx: prepare work for multiple queue support
From: Changchun Ouyang Do not use VIRTIO_RXQ or VIRTIO_TXQ anymore; use the queue_id, instead, which will be set to a proper value for a specific queue when we have multiple queue support enabled. For now, queue_id is still set with VIRTIO_RXQ or VIRTIO_TXQ, so it should not break anything. Signed-off-by: Changchun Ouyang Signed-off-by: Yuanhan Liu --- lib/librte_vhost/vhost_rxtx.c | 46 ++- 1 file changed, 32 insertions(+), 14 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 7026bfa..14e00ef 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -42,6 +42,16 @@ #define MAX_PKT_BURST 32 +static inline int __attribute__((always_inline)) +is_valid_virt_queue_idx(uint32_t virtq_idx, int is_tx, uint32_t max_qp_idx) +{ + if ((is_tx ^ (virtq_idx & 0x1)) || + (virtq_idx >= max_qp_idx * VIRTIO_QNUM)) + return 0; + + return 1; +} + /** * This function adds buffers to the virtio devices RX virtqueue. Buffers can * be received from the physical port or from another virtio device. A packet @@ -68,12 +78,14 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, uint8_t success = 0; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh); - if (unlikely(queue_id != VIRTIO_RXQ)) { - LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n"); + if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) { + RTE_LOG(ERR, VHOST_DATA, + "%s (%"PRIu64"): virtqueue idx:%d invalid.\n", + __func__, dev->device_fh, queue_id); return 0; } - vq = dev->virtqueue[VIRTIO_RXQ]; + vq = dev->virtqueue[queue_id]; count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count; /* @@ -235,8 +247,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, } static inline uint32_t __attribute__((always_inline)) -copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx, - uint16_t res_end_idx, struct rte_mbuf *pkt) +copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id, + uint16_t res_base_idx, uint16_t res_end_idx, + struct rte_mbuf *pkt) { uint32_t vec_idx = 0; uint32_t entry_success = 0; @@ -264,7 +277,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint16_t res_base_idx, * Convert from gpa to vva * (guest physical addr -> vhost virtual addr) */ - vq = dev->virtqueue[VIRTIO_RXQ]; + vq = dev->virtqueue[queue_id]; vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); vb_hdr_addr = vb_addr; @@ -464,11 +477,14 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n", dev->device_fh); - if (unlikely(queue_id != VIRTIO_RXQ)) { - LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n"); + if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) { + RTE_LOG(ERR, VHOST_DATA, + "%s (%"PRIu64"): virtqueue idx:%d invalid.\n", + __func__, dev->device_fh, queue_id); + return 0; } - vq = dev->virtqueue[VIRTIO_RXQ]; + vq = dev->virtqueue[queue_id]; count = RTE_MIN((uint32_t)MAX_PKT_BURST, count); if (count == 0) @@ -509,8 +525,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, res_cur_idx); } while (success == 0); - entry_success = copy_from_mbuf_to_vring(dev, res_base_idx, - res_cur_idx, pkts[pkt_idx]); + entry_success = copy_from_mbuf_to_vring(dev, queue_id, + res_base_idx, res_cur_idx, pkts[pkt_idx]); rte_compiler_barrier(); @@ -562,12 +578,14 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, uint16_t free_entries, entry_success = 0; uint16_t avail_idx; - if (unlikely(queue_id != VIRTIO_TXQ)) { - LOG_DEBUG(VHOST_DATA, "mq isn't supported in this version.\n"); + if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) { + RTE_LOG(ERR, VHOST_DATA, + "%s (%"PRIu64"): virtqueue idx:%d invalid.\n", + __func__, dev->device_fh, queue_id); return 0; } - vq = dev->virtqueue[VIRTIO_TXQ]; + vq = dev->virtqueue[queue_id]; avail_idx = *((volatile uint16_t *)&vq->avail->idx); /* If there are no available buffers then return. */ -- 1.9.0
[dpdk-dev] [PATCH v6 05/13] vhost-user: handle VHOST_USER_RESET_OWNER correctly
Destroy corresponding device when a VHOST_USER_RESET_OWNER message is received, otherwise, the vhost-switch would still try to access vq of that device, which results to SIGSEG fault, and let vhost-switch crash in the end. Signed-off-by: Changchun Ouyang Signed-off-by: Yuanhan Liu --- lib/librte_vhost/vhost_user/vhost-net-user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c index 8675cd4..f802b77 100644 --- a/lib/librte_vhost/vhost_user/vhost-net-user.c +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c @@ -379,7 +379,7 @@ vserver_message_handler(int connfd, void *dat, int *remove) ops->set_owner(ctx); break; case VHOST_USER_RESET_OWNER: - ops->reset_owner(ctx); + user_destroy_device(ctx); break; case VHOST_USER_SET_MEM_TABLE: -- 1.9.0
[dpdk-dev] [PATCH v6 06/13] virtio: read virtio_net_config correctly
From: Changchun Ouyang The old code adjusts the config bytes we want to read depending on what kind of features we have, but we later cast the entire buf we read with "struct virtio_net_config", which is obviously wrong. The right way to go is to read related config bytes when corresponding feature is set, which is exactly what this patch does. Signed-off-by: Changchun Ouyang Signed-off-by: Yuanhan Liu --- v6: read mac unconditionally. --- drivers/net/virtio/virtio_ethdev.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 465d3cd..e6aa1f7 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1162,7 +1162,6 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev) struct virtio_hw *hw = eth_dev->data->dev_private; struct virtio_net_config *config; struct virtio_net_config local_config; - uint32_t offset_conf = sizeof(config->mac); struct rte_pci_device *pci_dev; RTE_BUILD_BUG_ON(RTE_PKTMBUF_HEADROOM < sizeof(struct virtio_net_hdr)); @@ -1221,8 +1220,14 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev) if (vtpci_with_feature(hw, VIRTIO_NET_F_CTRL_VQ)) { config = &local_config; + vtpci_read_dev_config(hw, + offsetof(struct virtio_net_config, mac), + &config->mac, sizeof(config->mac)); + if (vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) { - offset_conf += sizeof(config->status); + vtpci_read_dev_config(hw, + offsetof(struct virtio_net_config, status), + &config->status, sizeof(config->status)); } else { PMD_INIT_LOG(DEBUG, "VIRTIO_NET_F_STATUS is not supported"); @@ -1230,15 +1235,16 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev) } if (vtpci_with_feature(hw, VIRTIO_NET_F_MQ)) { - offset_conf += sizeof(config->max_virtqueue_pairs); + vtpci_read_dev_config(hw, + offsetof(struct virtio_net_config, max_virtqueue_pairs), + &config->max_virtqueue_pairs, + sizeof(config->max_virtqueue_pairs)); } else { PMD_INIT_LOG(DEBUG, "VIRTIO_NET_F_MQ is not supported"); config->max_virtqueue_pairs = 1; } - vtpci_read_dev_config(hw, 0, (uint8_t *)config, offset_conf); - hw->max_rx_queues = (VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ? VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs; -- 1.9.0
[dpdk-dev] [PATCH v6 07/13] vhost-user: enable vhost-user multiple queue
By setting VHOST_USER_PROTOCOL_F_MQ protocol feature bit, and VIRTIO_NET_F_MQ feature bit. Signed-off-by: Yuanhan Liu --- lib/librte_vhost/vhost_user/virtio-net-user.h | 4 +++- lib/librte_vhost/virtio-net.c | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.h b/lib/librte_vhost/vhost_user/virtio-net-user.h index e7a6ff4..5f6d667 100644 --- a/lib/librte_vhost/vhost_user/virtio-net-user.h +++ b/lib/librte_vhost/vhost_user/virtio-net-user.h @@ -37,7 +37,9 @@ #include "vhost-net.h" #include "vhost-net-user.h" -#define VHOST_USER_PROTOCOL_FEATURES 0ULL +#define VHOST_USER_PROTOCOL_F_MQ 0 + +#define VHOST_USER_PROTOCOL_FEATURES (1ULL << VHOST_USER_PROTOCOL_F_MQ) int user_set_mem_table(struct vhost_device_ctx, struct VhostUserMsg *); diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c index 57fb7b1..d644022 100644 --- a/lib/librte_vhost/virtio-net.c +++ b/lib/librte_vhost/virtio-net.c @@ -74,6 +74,7 @@ static struct virtio_net_config_ll *ll_root; #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \ (1ULL << VIRTIO_NET_F_CTRL_VQ) | \ (1ULL << VIRTIO_NET_F_CTRL_RX) | \ + (1ULL << VIRTIO_NET_F_MQ) | \ (1ULL << VHOST_F_LOG_ALL) | \ (1ULL << VHOST_USER_F_PROTOCOL_FEATURES)) static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES; -- 1.9.0
[dpdk-dev] [PATCH v6 08/13] vhost: add VHOST_USER_SET_VRING_ENABLE message
From: Changchun Ouyang This message is used to enable/disable a specific vring queue pair. The first queue pair is enabled by default. Signed-off-by: Changchun Ouyang Signed-off-by: Yuanhan Liu --- v6: add a vring state changed callback, for informing the application that a specific vring is enabled/disabled. You could either flush packets haven't been processed yet, or simply just drop them. --- lib/librte_vhost/rte_virtio_net.h | 9 - lib/librte_vhost/vhost_rxtx.c | 10 ++ lib/librte_vhost/vhost_user/vhost-net-user.c | 5 + lib/librte_vhost/vhost_user/vhost-net-user.h | 1 + lib/librte_vhost/vhost_user/virtio-net-user.c | 28 +++ lib/librte_vhost/vhost_user/virtio-net-user.h | 3 +++ lib/librte_vhost/virtio-net.c | 12 +--- 7 files changed, 64 insertions(+), 4 deletions(-) diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index 5dd6493..fd87f01 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -89,6 +89,7 @@ struct vhost_virtqueue { volatile uint16_t last_used_idx_res; /**< Used for multiple devices reserving buffers. */ int callfd; /**< Used to notify the guest (trigger interrupt). */ int kickfd; /**< Currently unused as polling mode is enabled. */ + int enabled; struct buf_vector buf_vec[BUF_VECTOR_MAX];/**< for scatter RX. */ } __rte_cache_aligned; @@ -132,7 +133,7 @@ struct virtio_memory { }; /** - * Device operations to add/remove device. + * Device and vring operations. * * Make sure to set VIRTIO_DEV_RUNNING to the device flags in new_device and * remove it in destroy_device. @@ -141,12 +142,18 @@ struct virtio_memory { struct virtio_net_device_ops { int (*new_device)(struct virtio_net *); /**< Add device. */ void (*destroy_device)(volatile struct virtio_net *); /**< Remove device. */ + + int (*vring_state_changed)(struct virtio_net *dev, uint16_t queue_id, int enable); /**< triggered when a vring is enabled or disabled */ }; static inline uint16_t __attribute__((always_inline)) rte_vring_available_entries(struct virtio_net *dev, uint16_t queue_id) { struct vhost_virtqueue *vq = dev->virtqueue[queue_id]; + + if (vq->enabled) + return 0; + return *(volatile uint16_t *)&vq->avail->idx - vq->last_used_idx_res; } diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 14e00ef..400f263 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -86,6 +86,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, } vq = dev->virtqueue[queue_id]; + if (unlikely(vq->enabled == 0)) + return 0; + count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count; /* @@ -278,6 +281,7 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id, * (guest physical addr -> vhost virtual addr) */ vq = dev->virtqueue[queue_id]; + vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); vb_hdr_addr = vb_addr; @@ -485,6 +489,9 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, } vq = dev->virtqueue[queue_id]; + if (unlikely(vq->enabled == 0)) + return 0; + count = RTE_MIN((uint32_t)MAX_PKT_BURST, count); if (count == 0) @@ -586,6 +593,9 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, } vq = dev->virtqueue[queue_id]; + if (unlikely(vq->enabled == 0)) + return 0; + avail_idx = *((volatile uint16_t *)&vq->avail->idx); /* If there are no available buffers then return. */ diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c index f802b77..8fad385 100644 --- a/lib/librte_vhost/vhost_user/vhost-net-user.c +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c @@ -99,6 +99,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = { [VHOST_USER_GET_PROTOCOL_FEATURES] = "VHOST_USER_GET_PROTOCOL_FEATURES", [VHOST_USER_SET_PROTOCOL_FEATURES] = "VHOST_USER_SET_PROTOCOL_FEATURES", [VHOST_USER_GET_QUEUE_NUM] = "VHOST_USER_GET_QUEUE_NUM", + [VHOST_USER_SET_VRING_ENABLE] = "VHOST_USER_SET_VRING_ENABLE", }; /** @@ -428,6 +429,10 @@ vserver_message_handler(int connfd, void *dat, int *remove) send_vhost_message(connfd, &msg); break; + case VHOST_USER_SET_VRING_ENABLE: + user_set_vring_enable(ctx, &msg.payload.state); + break; + default: break; diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-ne
[dpdk-dev] [PATCH v6 09/13] vhost: add API bind a virtq to a specific core
From: Changchun Ouyang The new API rte_vhost_core_id_set() is to bind a virtq to a specific core, while the another API rte_vhost_core_id_get() is for getting the bind core for a virtq. The usage, which will be introduced soon, could be find at examles/vhost/main.c. Signed-off-by: Changchun Ouyang Signed-off-by: Yuanhan Liu --- lib/librte_vhost/rte_vhost_version.map | 7 +++ lib/librte_vhost/rte_virtio_net.h | 25 + lib/librte_vhost/virtio-net.c | 25 + 3 files changed, 57 insertions(+) diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index 3d8709e..2ce141c 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -18,5 +18,12 @@ DPDK_2.1 { global: rte_vhost_driver_unregister; +} DPDK_2.0; + + +DPDK_2.2 { + global: + rte_vhost_core_id_get; + rte_vhost_core_id_set; } DPDK_2.0; diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h index fd87f01..3b75d18 100644 --- a/lib/librte_vhost/rte_virtio_net.h +++ b/lib/librte_vhost/rte_virtio_net.h @@ -90,6 +90,7 @@ struct vhost_virtqueue { int callfd; /**< Used to notify the guest (trigger interrupt). */ int kickfd; /**< Currently unused as polling mode is enabled. */ int enabled; + uint32_tcore_id;/**< Data core that the vq is attached to */ struct buf_vector buf_vec[BUF_VECTOR_MAX];/**< for scatter RX. */ } __rte_cache_aligned; @@ -244,4 +245,28 @@ uint16_t rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id, uint16_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count); +/** + * This function get the data core id for queue pair in one vhost device. + * @param dev + * virtio-net device + * @param queue_id + * virtio queue index in mq case + * @return + * core id of queue pair of specified virtio device. + */ +uint16_t rte_vhost_core_id_get(volatile struct virtio_net *dev, + uint16_t queue_id); + +/** + * This function set the data core id for queue pair in one vhost device. + * @param dev + * virtio-net device + * @param queue_id + * virtio queue index in mq case + * @param core_id + * data core id for virtio queue pair in mq case + */ +void rte_vhost_core_id_set(struct virtio_net *dev, uint16_t queue_id, + uint16_t core_id); + #endif /* _VIRTIO_NET_H_ */ diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c index b11fd61..d304ee6 100644 --- a/lib/librte_vhost/virtio-net.c +++ b/lib/librte_vhost/virtio-net.c @@ -868,6 +868,31 @@ int rte_vhost_feature_enable(uint64_t feature_mask) return -1; } +uint16_t +rte_vhost_core_id_get(volatile struct virtio_net *dev, uint16_t queue_id) +{ + if (dev == NULL) + return 0; + + if (dev->virtqueue == NULL || dev->virtqueue[queue_id] == NULL) + return 0; + + return dev->virtqueue[queue_id]->core_id; +} + +void +rte_vhost_core_id_set(struct virtio_net *dev, uint16_t queue_id, + uint16_t core_id) +{ + if (dev == NULL) + return; + + if (dev->virtqueue == NULL || dev->virtqueue[queue_id] == NULL) + return; + + dev->virtqueue[queue_id]->core_id = core_id; +} + /* * Register ops so that we can add/remove device to data core. */ -- 1.9.0
[dpdk-dev] [PATCH v6 10/13] ixgbe: support VMDq RSS in non-SRIOV environment
From: Changchun Ouyang In non-SRIOV environment, VMDq RSS could be enabled by MRQC register. In theory, the queue number per pool could be 2 or 4, but only 2 queues are available due to HW limitation, the same limit also exists in Linux ixgbe driver. Signed-off-by: Changchun Ouyang Signed-off-by: Yuanhan Liu --- drivers/net/ixgbe/ixgbe_rxtx.c | 86 +++--- lib/librte_ether/rte_ethdev.c | 11 ++ 2 files changed, 84 insertions(+), 13 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index a598a72..e502fe8 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -3445,16 +3445,16 @@ void ixgbe_configure_dcb(struct rte_eth_dev *dev) return; } -/* - * VMDq only support for 10 GbE NIC. +/** + * Config pool for VMDq on 10 GbE NIC. */ static void -ixgbe_vmdq_rx_hw_configure(struct rte_eth_dev *dev) +ixgbe_vmdq_pool_configure(struct rte_eth_dev *dev) { struct rte_eth_vmdq_rx_conf *cfg; struct ixgbe_hw *hw; enum rte_eth_nb_pools num_pools; - uint32_t mrqc, vt_ctl, vlanctrl; + uint32_t vt_ctl, vlanctrl; uint32_t vmolr = 0; int i; @@ -3463,12 +3463,6 @@ ixgbe_vmdq_rx_hw_configure(struct rte_eth_dev *dev) cfg = &dev->data->dev_conf.rx_adv_conf.vmdq_rx_conf; num_pools = cfg->nb_queue_pools; - ixgbe_rss_disable(dev); - - /* MRQC: enable vmdq */ - mrqc = IXGBE_MRQC_VMDQEN; - IXGBE_WRITE_REG(hw, IXGBE_MRQC, mrqc); - /* PFVTCTL: turn on virtualisation and set the default pool */ vt_ctl = IXGBE_VT_CTL_VT_ENABLE | IXGBE_VT_CTL_REPLEN; if (cfg->enable_default_pool) @@ -3534,7 +3528,29 @@ ixgbe_vmdq_rx_hw_configure(struct rte_eth_dev *dev) IXGBE_WRITE_FLUSH(hw); } -/* +/** + * VMDq only support for 10 GbE NIC. + */ +static void +ixgbe_vmdq_rx_hw_configure(struct rte_eth_dev *dev) +{ + struct ixgbe_hw *hw; + uint32_t mrqc; + + PMD_INIT_FUNC_TRACE(); + hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); + + ixgbe_rss_disable(dev); + + /* MRQC: enable vmdq */ + mrqc = IXGBE_MRQC_VMDQEN; + IXGBE_WRITE_REG(hw, IXGBE_MRQC, mrqc); + IXGBE_WRITE_FLUSH(hw); + + ixgbe_vmdq_pool_configure(dev); +} + +/** * ixgbe_dcb_config_tx_hw_config - Configure general VMDq TX parameters * @hw: pointer to hardware structure */ @@ -3639,6 +3655,41 @@ ixgbe_config_vf_rss(struct rte_eth_dev *dev) } static int +ixgbe_config_vmdq_rss(struct rte_eth_dev *dev) +{ + struct ixgbe_hw *hw; + uint32_t mrqc; + + ixgbe_rss_configure(dev); + + hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); + + /* MRQC: enable VMDQ RSS */ + mrqc = IXGBE_READ_REG(hw, IXGBE_MRQC); + mrqc &= ~IXGBE_MRQC_MRQE_MASK; + + switch (RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool) { + case 2: + mrqc |= IXGBE_MRQC_VMDQRSS64EN; + break; + + case 4: + mrqc |= IXGBE_MRQC_VMDQRSS32EN; + break; + + default: + PMD_INIT_LOG(ERR, "Invalid pool number in non-IOV mode with VMDQ RSS"); + return -EINVAL; + } + + IXGBE_WRITE_REG(hw, IXGBE_MRQC, mrqc); + + ixgbe_vmdq_pool_configure(dev); + + return 0; +} + +static int ixgbe_config_vf_default(struct rte_eth_dev *dev) { struct ixgbe_hw *hw = @@ -3694,6 +3745,10 @@ ixgbe_dev_mq_rx_configure(struct rte_eth_dev *dev) ixgbe_vmdq_rx_hw_configure(dev); break; + case ETH_MQ_RX_VMDQ_RSS: + ixgbe_config_vmdq_rss(dev); + break; + case ETH_MQ_RX_NONE: /* if mq_mode is none, disable rss mode.*/ default: ixgbe_rss_disable(dev); @@ -4186,6 +4241,8 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev) /* Setup RX queues */ for (i = 0; i < dev->data->nb_rx_queues; i++) { + uint32_t psrtype = 0; + rxq = dev->data->rx_queues[i]; /* @@ -4213,12 +4270,10 @@ ixgbe_dev_rx_init(struct rte_eth_dev *dev) if (rx_conf->header_split) { if (hw->mac.type == ixgbe_mac_82599EB) { /* Must setup the PSRTYPE register */ - uint32_t psrtype; psrtype = IXGBE_PSRTYPE_TCPHDR | IXGBE_PSRTYPE_UDPHDR | IXGBE_PSRTYPE_IPV4HDR | IXGBE_PSRTYPE_IPV6HDR; - IXGBE_WRITE_REG(hw, IXGBE_PSRTYPE(rxq->reg_idx), psrtype); } srrctl = ((rx_conf->split_hdr_size << IXGBE_SRRCTL_BSIZEH
[dpdk-dev] [PATCH v6 11/13] examples/vhost: demonstrate the usage of vhost mq feature
From: Changchun Ouyang This patch demonstrates the usage of vhost mq feature, by leveraging the VMDq+RSS HW feature to receive packets and distribute them into different queue in the pool according to 5 tuples. Queue number is specified by the --rxq option. HW queue numbers in pool is exactly same with the queue number in virtio device, e.g. rxq = 4, the queue number is 4, it means 4 HW queues in each VMDq pool, and 4 queues in each virtio device/port, one maps to each. = ==| |==| vport0 | | vport1 | --- --- --- ---| |--- --- --- ---| q0 | q1 | q2 | q3 | |q0 | q1 | q2 | q3 | /\= =/\= =/\= =/\=| |/\= =/\= =/\= =/\=| || || || || || || || || || || || || || || || || ||= =||= =||= =||=| =||== ||== ||== ||=| q0 | q1 | q2 | q3 | |q0 | q1 | q2 | q3 | --| |--| VMDq pool0 | |VMDq pool1| ==| |==| In RX side, it firstly polls each queue of the pool and gets the packets from it and enqueue them into its corresponding queue in virtio device/port. In TX side, it dequeue packets from each queue of virtio device/port and send them to either physical port or another virtio device according to its destination MAC address. We bind the virtq to a specific core by rte_vhost_core_id_set(), and later we can retrieve it by rte_vhost_core_id_get(). Signed-off-by: Changchun Ouyang Signed-off-by: Yuanhan Liu --- examples/vhost/main.c | 325 ++ examples/vhost/main.h | 3 +- 2 files changed, 225 insertions(+), 103 deletions(-) diff --git a/examples/vhost/main.c b/examples/vhost/main.c index 9eac2d0..23b7aa7 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -163,6 +163,9 @@ static int mergeable; /* Do vlan strip on host, enabled on default */ static uint32_t vlan_strip = 1; +/* Rx queue number per virtio device */ +static uint32_t rxq = 1; + /* number of descriptors to apply*/ static uint32_t num_rx_descriptor = RTE_TEST_RX_DESC_DEFAULT_ZCP; static uint32_t num_tx_descriptor = RTE_TEST_TX_DESC_DEFAULT_ZCP; @@ -365,6 +368,37 @@ validate_num_devices(uint32_t max_nb_devices) return 0; } +static int +get_dev_nb_for_82599(struct rte_eth_dev_info dev_info) +{ + int dev_nb = -1; + switch (rxq) { + case 1: + case 2: + /* +* for 82599, dev_info.max_vmdq_pools always 64 dispite rx mode. +*/ + dev_nb = (int)dev_info.max_vmdq_pools; + break; + case 4: + dev_nb = (int)dev_info.max_vmdq_pools / 2; + break; + default: + RTE_LOG(ERR, VHOST_CONFIG, "invalid rxq for VMDq.\n"); + } + return dev_nb; +} + +static int +get_dev_nb_for_fvl(struct rte_eth_dev_info dev_info) +{ + /* +* for FVL, dev_info.max_vmdq_pools is calculated according to +* the configured value: CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_VM. +*/ + return (int)dev_info.max_vmdq_pools; +} + /* * Initialises a given port using global settings and with the rx buffers * coming from the mbuf_pool passed as parameter @@ -380,6 +414,7 @@ port_init(uint8_t port) uint16_t rx_ring_size, tx_ring_size; int retval; uint16_t q; + struct rte_eth_dev *eth_dev; /* The max pool number from dev_info will be used to validate the pool number specified in cmd line */ rte_eth_dev_info_get (port, &dev_info); @@ -408,8 +443,16 @@ port_init(uint8_t port) txconf->tx_deferred_start = 1; } - /*configure the number of supported virtio devices based on VMDQ limits */ - num_devices = dev_info.max_vmdq_pools; + /* Configure the virtio devices num based on VMDQ limits */ + if (dev_info.max_vmdq_pools == ETH_64_POOLS) { + num_devices = (uint32_t)get_dev_nb_for_82599(dev_info); + if (num_devices == (uint32_t)-1) + return -1; + } else { + num_devices = (uint32_t)get_dev_nb_for_fvl(dev_info); + if (num_devices == (uint32_t)-1) + return -1; + } if (zero_copy) { rx_ring_size = num_rx_descriptor; @@ -431,7 +474,7 @@ port_init(uint8_t port) return retval; /* NIC queues are divided into pf queues and vmdq queues. */ num_pf_queues = dev_info.max_rx_queues - dev_info.vmdq_queue_num; - queues_per_pool = dev_info.vmdq_queue_num / dev_info.max_vmdq_pools; + queues_per_pool = dev_info.vmdq_queue_num / num_devices; num_vmdq_queues = num_devices * queues_per_pool; num_queues = num_pf_queues + num_vmdq_queues; vmdq_queue_base = dev_info.vmdq_queue_base; @@ -447,6 +490,14 @@ port_init(uint8_t port) if (retval != 0)
[dpdk-dev] [PATCH v6 12/13] examples/vhost: add per queue stats
From: Changchun Ouyang Signed-off-by: Changchun Ouyang Signed-off-by: Yuanhan Liu --- examples/vhost/main.c | 97 +-- 1 file changed, 56 insertions(+), 41 deletions(-) diff --git a/examples/vhost/main.c b/examples/vhost/main.c index 23b7aa7..06a3ac7 100644 --- a/examples/vhost/main.c +++ b/examples/vhost/main.c @@ -314,7 +314,7 @@ struct ipv4_hdr { #define VLAN_ETH_HLEN 18 /* Per-device statistics struct */ -struct device_statistics { +struct qp_statistics { uint64_t tx_total; rte_atomic64_t rx_total_atomic; uint64_t rx_total; @@ -322,6 +322,10 @@ struct device_statistics { rte_atomic64_t rx_atomic; uint64_t rx; } __rte_cache_aligned; + +struct device_statistics { + struct qp_statistics *qp_stats; +}; struct device_statistics dev_statistics[MAX_DEVICES]; /* @@ -775,6 +779,17 @@ us_vhost_parse_args(int argc, char **argv) return -1; } else { enable_stats = ret; + if (enable_stats) + for (i = 0; i < MAX_DEVICES; i++) { + dev_statistics[i].qp_stats = + malloc(VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX * sizeof(struct qp_statistics)); + if (dev_statistics[i].qp_stats == NULL) { + RTE_LOG(ERR, VHOST_CONFIG, "Failed to allocate memory for qp stats.\n"); + return -1; + } + memset(dev_statistics[i].qp_stats, 0, + VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX * sizeof(struct qp_statistics)); + } } } @@ -1131,13 +1146,13 @@ virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m, uint32_t qp_idx) &m, 1); if (enable_stats) { rte_atomic64_add( - &dev_statistics[tdev->device_fh].rx_total_atomic, + &dev_statistics[tdev->device_fh].qp_stats[qp_idx].rx_total_atomic, 1); rte_atomic64_add( - &dev_statistics[tdev->device_fh].rx_atomic, + &dev_statistics[tdev->device_fh].qp_stats[qp_idx].rx_atomic, ret); - dev_statistics[tdev->device_fh].tx_total++; - dev_statistics[tdev->device_fh].tx += ret; + dev_statistics[dev->device_fh].qp_stats[qp_idx].tx_total++; + dev_statistics[dev->device_fh].qp_stats[qp_idx].tx += ret; } } @@ -1271,8 +1286,8 @@ virtio_tx_route(struct vhost_dev *vdev, struct rte_mbuf *m, tx_q->m_table[len] = m; len++; if (enable_stats) { - dev_statistics[dev->device_fh].tx_total++; - dev_statistics[dev->device_fh].tx++; + dev_statistics[dev->device_fh].qp_stats[qp_idx].tx_total++; + dev_statistics[dev->device_fh].qp_stats[qp_idx].tx++; } if (unlikely(len == MAX_PKT_BURST)) { @@ -1403,10 +1418,10 @@ switch_worker(__attribute__((unused)) void *arg) pkts_burst, rx_count); if (enable_stats) { rte_atomic64_add( - &dev_statistics[dev_ll->vdev->dev->device_fh].rx_total_atomic, + &dev_statistics[dev_ll->vdev->dev->device_fh].qp_stats[qp_idx].rx_total_atomic, rx_count); rte_atomic64_add( - &dev_statistics[dev_ll->vdev->dev->device_fh].rx_atomic, ret_count); + &dev_statistics[dev_ll->vdev->dev->device_fh].qp_stats[qp_idx].rx_atomic, ret_count); } while (likely(rx_count)) { rx_count--; @@ -1954,8 +1969,8 @@ virtio_tx_route_zcp(st
[dpdk-dev] [PATCH v6 13/13] doc: update release note for vhost-user mq support
Signed-off-by: Yuanhan Liu --- doc/guides/rel_notes/release_2_2.rst | 5 + 1 file changed, 5 insertions(+) diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst index 5687676..34c910f 100644 --- a/doc/guides/rel_notes/release_2_2.rst +++ b/doc/guides/rel_notes/release_2_2.rst @@ -4,6 +4,11 @@ DPDK Release 2.2 New Features +* **vhost: added vhost-user mulitple queue support.** + + Added vhost-user multiple queue support, and it is demonstrated at + ``examples/vhost/vhost-switch``. + Resolved Issues --- -- 1.9.0
[dpdk-dev] [PATCH v5 resend 07/12] virtio: resolve for control queue
Hi, > I just recognized that this dead loop is the same one that I have > experienced (see > http://dpdk.org/ml/archives/dev/2015-October/024737.html for reference). > Just applying the changes in this patch (only 07/12) will not fix the > dead loop at least in my setup. Yes, exactly. I observe it same way even after applying the patch. -- Best regards, Nikita Kalyazin, n.kalyazin at samsung.com Software Engineer Virtualization Group Samsung R&D Institute Russia Tel: +7 (495) 797-25-00 #3816 Tel: +7 (495) 797-25-03 Office #1501, 12-1, Dvintsev str., Moscow, 127018, Russia On Thu, Oct 08, 2015 at 10:51:02PM +0200, Steffen Bauch wrote: > > > On 10/08/2015 05:32 PM, Nikita Kalyazin wrote: > > Hi Yuanhan, > > > > > > As I understand, the dead loop happened here (virtio_send_command): > > while (vq->vq_used_cons_idx == vq->vq_ring.used->idx) { > >rte_rmb(); > >usleep(100); > > } > > > > Could you explain why wrong config reading caused that and how correct > > reading helps to avoid? > > > Hi, > > I just recognized that this dead loop is the same one that I have > experienced (see > http://dpdk.org/ml/archives/dev/2015-October/024737.html for reference). > Just applying the changes in this patch (only 07/12) will not fix the > dead loop at least in my setup. > > Best regards, > > Steffen
[dpdk-dev] Compilation bug: dpdk master compilation fails with Fedora rawhide (kernel 4.3.0)
Hi all, I am running Fedora Rawhide with the latest Linux kernel (4.3.0 rc4) and the latest dpdk no longer compiles, with error message "struct pci_dev has no member msi_list". This is due to kernel patch number 4a7cc831670550e6b48ef5760e7213f89935ff0d which is now in v4.3-rc1, v4.3-rc2 and v4.3-rc3. The fix seems to be (according to the patch) to compile the kernel with CONFIG_GENERIC_MSI_IRQ. Fedora Rawhide kernels are compiled with this option, so it seems that the dpdk compilation system is not defining that flag. Any ideas where to add the line "CONFIG_GENERIC_MSI_IRQ=y"? -Tapio
[dpdk-dev] [PATCH] i40e: fix the write back issue in FVL VF
If DPDK is used on VF while the host is using Linux Kernel driver as PF driver on FVL NIC, then VF Rx is reported only in batches of 4 packets. It is due to the kernel driver assumes VF driver is working in interrupt mode, but DPDK VF is working in Polling mode. This patch fixes this issue by using the V1.1 virtual channel with Linux i40e PF driver. Signed-off-by: Jingjing Wu --- drivers/net/i40e/i40e_ethdev.h| 5 +++ drivers/net/i40e/i40e_ethdev_vf.c | 66 +-- 2 files changed, 54 insertions(+), 17 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h index 6185657..d42487d 100644 --- a/drivers/net/i40e/i40e_ethdev.h +++ b/drivers/net/i40e/i40e_ethdev.h @@ -91,6 +91,11 @@ #define I40E_48_BIT_WIDTH (CHAR_BIT * 6) #define I40E_48_BIT_MASK RTE_LEN2MASK(I40E_48_BIT_WIDTH, uint64_t) +/* Linux PF host with virtchnl version 1.1 */ +#define PF_IS_V11(vf) \ + (((vf)->version_major == I40E_VIRTCHNL_VERSION_MAJOR) && \ + ((vf)->version_minor == 1)) + /* index flex payload per layer */ enum i40e_flxpld_layer_idx { I40E_FLXPLD_L2_IDX= 0, diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index b694400..176a2f6 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -67,7 +67,8 @@ #include "i40e_rxtx.h" #include "i40e_ethdev.h" #include "i40e_pf.h" -#define I40EVF_VSI_DEFAULT_MSIX_INTR 1 +#define I40EVF_VSI_DEFAULT_MSIX_INTR 1 +#define I40EVF_VSI_DEFAULT_MSIX_INTR_LNX 0 /* busy wait delay in msec */ #define I40EVF_BUSY_WAIT_DELAY 10 @@ -412,7 +413,7 @@ i40evf_check_api_version(struct rte_eth_dev *dev) if (vf->version_major == I40E_DPDK_VERSION_MAJOR) PMD_DRV_LOG(INFO, "Peer is DPDK PF host"); else if ((vf->version_major == I40E_VIRTCHNL_VERSION_MAJOR) && - (vf->version_minor == I40E_VIRTCHNL_VERSION_MINOR)) + (vf->version_minor <= I40E_VIRTCHNL_VERSION_MINOR)) PMD_DRV_LOG(INFO, "Peer is Linux PF host"); else { PMD_INIT_LOG(ERR, "PF/VF API version mismatch:(%u.%u)-(%u.%u)", @@ -432,14 +433,23 @@ i40evf_get_vf_resource(struct rte_eth_dev *dev) struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private); int err; struct vf_cmd_info args; - uint32_t len; + uint32_t caps, len; args.ops = I40E_VIRTCHNL_OP_GET_VF_RESOURCES; - args.in_args = NULL; - args.in_args_size = 0; args.out_buffer = cmd_result_buffer; args.out_size = I40E_AQ_BUF_SZ; - + if (PF_IS_V11(vf)) { + caps = I40E_VIRTCHNL_VF_OFFLOAD_L2 | + I40E_VIRTCHNL_VF_OFFLOAD_RSS_AQ | + I40E_VIRTCHNL_VF_OFFLOAD_RSS_REG | + I40E_VIRTCHNL_VF_OFFLOAD_VLAN | + I40E_VIRTCHNL_VF_OFFLOAD_RX_POLLING; + args.in_args = (uint8_t *)∩︀ + args.in_args_size = sizeof(caps); + } else { + args.in_args = NULL; + args.in_args_size = 0; + } err = i40evf_execute_vf_cmd(dev, &args); if (err) { @@ -692,6 +702,8 @@ i40evf_configure_queues(struct rte_eth_dev *dev) return i40evf_configure_vsi_queues(dev); } +#define I40E_QINT_RQCTL_MSIX_INDX_NOITR 3 + static int i40evf_config_irq_map(struct rte_eth_dev *dev) { @@ -703,11 +715,14 @@ i40evf_config_irq_map(struct rte_eth_dev *dev) int i, err; map_info = (struct i40e_virtchnl_irq_map_info *)cmd_buffer; map_info->num_vectors = 1; - map_info->vecmap[0].rxitr_idx = RTE_LIBRTE_I40E_ITR_INTERVAL / 2; - map_info->vecmap[0].txitr_idx = RTE_LIBRTE_I40E_ITR_INTERVAL / 2; + map_info->vecmap[0].rxitr_idx = I40E_QINT_RQCTL_MSIX_INDX_NOITR; map_info->vecmap[0].vsi_id = vf->vsi_res->vsi_id; /* Alway use default dynamic MSIX interrupt */ - map_info->vecmap[0].vector_id = I40EVF_VSI_DEFAULT_MSIX_INTR; + if (vf->version_major == I40E_DPDK_VERSION_MAJOR) + map_info->vecmap[0].vector_id = I40EVF_VSI_DEFAULT_MSIX_INTR; + else + map_info->vecmap[0].vector_id = I40EVF_VSI_DEFAULT_MSIX_INTR_LNX; + /* Don't map any tx queue */ map_info->vecmap[0].txq_map = 0; map_info->vecmap[0].rxq_map = 0; @@ -1546,18 +1561,37 @@ i40evf_tx_init(struct rte_eth_dev *dev) } static inline void -i40evf_enable_queues_intr(struct i40e_hw *hw) +i40evf_enable_queues_intr(struct rte_eth_dev *dev) { - I40E_WRITE_REG(hw, I40E_VFINT_DYN_CTLN1(I40EVF_VSI_DEFAULT_MSIX_INTR - 1), + struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private); + struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private); + + if (vf->version_major == I40E_DPDK_VERSION_MAJOR) + /* To support DPDK PF host */ + I40E_WRITE_REG(hw, + I40
[dpdk-dev] rte_eal_init() alternative?
On 10/08/2015 05:58 PM, Montorsi, Francesco wrote: > Hi, > >> -Original Message- >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] >> Sent: mercoled? 2 settembre 2015 15:10 >> To: Montorsi, Francesco >> Cc: dev at dpdk.org; Bruce Richardson >> Subject: Re: [dpdk-dev] rte_eal_init() alternative? >> >> 2015-09-02 13:56, Bruce Richardson: >>> On Wed, Sep 02, 2015 at 12:49:40PM +, Montorsi, Francesco wrote: Hi all, Currently it seems that the only way to initialize EAL is using rte_eal_init() >> function, correct? I have the problem that rte_eal_init() will call rte_panic() whenever >> something fails to initialize or in other cases it will call exit(). In my application, I would rather like to attempt DPDK initialization. If it >> fails I don't want to exit. Unfortunately I cannot even copy&paste the rte_eal_init() code into my >> application (removing rte_panic and exit calls) since it uses a lot of DPDK >> internal private functions. I think that my requirements (avoid abort/exit calls when init fails) is a >> basic requirement... would you accept a patch that adds an alternative >> rte_eal_init() function that just returns an error code upon failure, >> instead of >> immediately exiting? Thanks for your hard work! Francesco Montorsi >>> I, for one, would welcome such a patch. I think the code is overly >>> quick in many places to panic or exit the app, when an error code would be >> more appropriate. >>> Feel free to also look at other libraries in DPDK too, if you like :-) >> >> Yes but please, do not create an alternative init function. >> We just need to replace panic/exit with error codes and be sure that apps >> and examples handle them correctly. > > To maintain compatibility with existing applications I think that > perhaps the best would be to have a core initialization function > rte_eal_init_raw() that never calls rte_panic() and returns an error > code. Then we can maintain compatibility having an rte_eal_init() > function that does call rte_panic() if rte_eal_init_raw() fails. Note that callers are already required to check rte_eal_init() return code for errors, and any app failing to do so would be buggy to begin with. So just turning the panics into error returns is not an incompatible change. I agree with Thomas here, lets just fix rte_eal_init() to do the right thing instead of adding alternatives just for the error return. Especially when _raw() in the name suggests that is not the thing you'd commonly want to use. > Something like the attached patch. It seems the patch missed the boat :) > Note that the attached patch exposes also a way to skip the > argv/argc configuration process by directly providing a populated > configuration structure... > Let me know what you think about it (the patch is just a draft and > needs more work). Can't comment on what I've not seen, but based on comments seen on this list, having an alternative way to initialize with structures would be welcomed by many. The downside is that those structures will need to be exposed in the API forever which means any changes there are subject to the ABI process. - Panu - > Thanks, > Francesco > > > > >
[dpdk-dev] rte_eal_init() alternative?
Hi Panu, > -Original Message- > From: Panu Matilainen [mailto:pmatilai at redhat.com] > Sent: venerd? 9 ottobre 2015 10:26 > To: Montorsi, Francesco ; Thomas Monjalon > > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] rte_eal_init() alternative? > > > Something like the attached patch. > > It seems the patch missed the boat :) Correct, sorry. I'm attaching it now. > > > Note that the attached patch exposes also a way to skip the argv/argc > > configuration process by directly providing a populated configuration > > structure... > > Let me know what you think about it (the patch is just a draft and > > needs more work). > > Can't comment on what I've not seen, but based on comments seen on this > list, having an alternative way to initialize with structures would be > welcomed > by many. The downside is that those structures will need to be exposed in > the API forever which means any changes there are subject to the ABI > process. > Perhaps the init function taking a structure could be an exception for ABI changes... i.e., the format of the configuration is not garantueed to stay the same between different versions, and applications using a shared build of DPDK libraries must avoid using the configuration structure... would that be a possible solution? Thanks, Francesco
[dpdk-dev] rte_eal_init() alternative?
> > It seems the patch missed the boat :) > > Correct, sorry. I'm attaching it now. Ok, for some reason the email client is removing the attachment... I'm copying and pasting it: (the points marked as TODO are functions that still contain rte_panic() calls...) dpdk-2.1.0/lib/librte_eal/common/eal_common_log.c - dpdk-2.1.0/lib/librte_eal/common/eal_common_log.c dpdk-2.1.0/lib/librte_eal/common/include/rte_eal.h - dpdk-2.1.0/lib/librte_eal/common/include/rte_eal.h --- /tmp/tmp.6220.372015-10-08 16:15:22.402607404 +0200 +++ dpdk-2.1.0/lib/librte_eal/common/include/rte_eal.h 2015-10-08 15:57:21.442627152 +0200 @@ -141,6 +141,9 @@ * returning. See also the rte_eal_get_configuration() function. Note: * This behavior may change in the future. * + * This function will log and eventually abort the entire application if + * initialization fails. + * * @param argc * The argc argument that was given to the main() function. * @param argv @@ -153,6 +156,27 @@ * - On failure, a negative error value. */ int rte_eal_init(int argc, char **argv); + +/** + * Initialize the Environment Abstraction Layer (EAL). + * + * Please refer to rte_eal_init() for more information. + * The difference between rte_eal_init() and rte_eal_init_raw() + * is that the latter will never abort the entire process but rather + * will just log an error and return an error code. + * + * @param logid + * A string that identifies the whole process, used to prefix log messages; + * on Linux will be used as the 'ident' parameter of the syslog facility openlog(). + * @param cfg + * The internal configuration for RTE EAL. + * @return + * - On success, zero. + * - On failure, a negative error value. + */ +struct internal_config; +int rte_eal_init_raw(const char* logid, struct internal_config *cfg); + /** * Usage function typedef used by the application usage function. * dpdk-2.1.0/lib/librte_eal/linuxapp/eal/eal.c - dpdk-2.1.0/lib/librte_eal/linuxapp/eal/eal.c --- /tmp/tmp.6220.752015-10-08 16:15:22.406607404 +0200 +++ dpdk-2.1.0/lib/librte_eal/linuxapp/eal/eal.c2015-10-08 16:15:10.106607628 +0200 @@ -178,7 +178,7 @@ * on other parts, e.g. memzones, to detect if there are running secondary * processes. */ static void -rte_eal_config_create(void) +rte_eal_config_create(void)// TODO { void *rte_mem_cfg_addr; int retval; @@ -232,7 +232,7 @@ /* attach to an existing shared memory config */ static void -rte_eal_config_attach(void) +rte_eal_config_attach(void)// TODO { struct rte_mem_config *mem_config; @@ -258,7 +258,7 @@ /* reattach the shared config at exact memory location primary process has it */ static void -rte_eal_config_reattach(void) +rte_eal_config_reattach(void) // TODO { struct rte_mem_config *mem_config; void *rte_mem_cfg_addr; @@ -305,7 +305,7 @@ /* Sets up rte_config structure with the pointer to shared memory config.*/ static void -rte_config_init(void) +rte_config_init(void) // TODO { rte_config.process_type = internal_config.process_type; @@ -724,25 +724,17 @@ #endif } -/* Launch threads, called at application init(). */ + +/* Launch threads, called at application init(). Logs and aborts on critical errors. */ int rte_eal_init(int argc, char **argv) { - int i, fctret, ret; - pthread_t thread_id; - static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0); - struct shared_driver *solib = NULL; + int fctret; const char *logid; - char cpuset[RTE_CPU_AFFINITY_STR_LEN]; - - if (!rte_atomic32_test_and_set(&run_once)) - return -1; logid = strrchr(argv[0], '/'); logid = strdup(logid ? logid + 1: argv[0]); - thread_id = pthread_self(); - if (rte_eal_log_early_init() < 0) rte_panic("Cannot init early logs\n"); @@ -751,18 +743,54 @@ /* set log level as early as possible */ rte_set_log_level(internal_config.log_level); - if (rte_eal_cpu_init() < 0) - rte_panic("Cannot detect lcores\n"); - fctret = eal_parse_args(argc, argv); if (fctret < 0) exit(1); + if (rte_eal_init_raw(logid, NULL) < 0) + rte_panic("Errors encountered during initialization. Cannot proceed.\n"); + + return fctret; +} + +/* Library-style init(), will attempt initialization, log on errors and return; + * This function does not rte_panic() or exit() the whole process. */ +int +rte_eal_init_raw(const char* logid, struct internal_config *cfg) +{ + int i, ret; + pthread_t thread_id; + static rte_atomic32_t run_once = RTE_ATOMIC32_INIT(0); + struct shared_driver *solib = NULL; + char cpuset[RTE_CPU_AFFINITY_STR_LEN]; + + if (!rte_atomic32_test_and_set(&run_once)) + return -1; + + thread_id = pthread_self(); +
[dpdk-dev] rte_eal_init() alternative?
On 10/09/2015 01:03 PM, Montorsi, Francesco wrote: > Hi Panu, > > > >> -Original Message- >> From: Panu Matilainen [mailto:pmatilai at redhat.com] >> Sent: venerd? 9 ottobre 2015 10:26 >> To: Montorsi, Francesco ; Thomas Monjalon >> >> Cc: dev at dpdk.org >> Subject: Re: [dpdk-dev] rte_eal_init() alternative? >> >>> Something like the attached patch. >> >> It seems the patch missed the boat :) > > Correct, sorry. I'm attaching it now. > > >> >>> Note that the attached patch exposes also a way to skip the argv/argc >>> configuration process by directly providing a populated configuration >>> structure... >>> Let me know what you think about it (the patch is just a draft and >>> needs more work). >> >> Can't comment on what I've not seen, but based on comments seen on this >> list, having an alternative way to initialize with structures would be >> welcomed >> by many. The downside is that those structures will need to be exposed in >> the API forever which means any changes there are subject to the ABI >> process. >> > Perhaps the init function taking a structure could be an exception > for ABI changes... i.e., the format of the configuration is not > garantueed to stay the same between different versions, and > applications using a shared build of DPDK libraries must avoid using > the configuration structure... would that be a possible solution? Sorry but no, down the path of exceptions lies madness. It'd also be giving the middle finger to people using DPDK as a shared library. Exported structs are always a PITA and even more so in something like configuration which is expected to keep expanding and/or otherwise changing. I'd much rather see an rte_eal_init() which takes struct *rte_cfgfile as the configuration argument. That, plus maybe enhance librte_cfgfile to allow constructing one entirely in memory + setting values in addition to getting. - Panu - > Thanks, > Francesco > > >
[dpdk-dev] [PATCH 0/4] librte_table: add name parameter to lpm table
2015-09-08 12:57, Dumitrescu, Cristian: > From: Singh, Jasvinder > > This patchset links to ABI change announced for librte_table. For lpm table, > > name parameter has been included in LPM table parameters structure. > > It will eventually allow applications to create more than one instances > > of lpm table, if required. > > > > Acked-by: Cristian Dumitrescu Applied, thanks
[dpdk-dev] rte_eal_init() alternative?
On 10/09/2015 01:13 PM, Montorsi, Francesco wrote: >>> It seems the patch missed the boat :) >> >> Correct, sorry. I'm attaching it now. > Ok, for some reason the email client is removing the attachment... I'm > copying and pasting it: > (the points marked as TODO are functions that still contain rte_panic() > calls...) I actually did receive the attachment from the previous mail, but inlined patches are far better for commenting purposes. > + */ > +struct internal_config; > +int rte_eal_init_raw(const char* logid, struct internal_config *cfg); Like the name indicates, struct internal_config is internal to librte_eal, you'd need to "export" the eal_internal_cfg.h header for this to be useful to users outside librte_eal itself. But I'd say there's a reason why its internal... > - if (rte_eal_pci_init() < 0) > - rte_panic("Cannot init PCI\n"); > + if (rte_eal_pci_init() < 0) { > + RTE_LOG (ERR, EAL, "Cannot init PCI\n"); > + return -1; > + } > > #ifdef RTE_LIBRTE_IVSHMEM > - if (rte_eal_ivshmem_init() < 0) > - rte_panic("Cannot init IVSHMEM\n"); > + if (rte_eal_ivshmem_init() < 0) { > + RTE_LOG (ERR, EAL, "Cannot init IVSHMEM\n"); > + return -1; > + } > #endif > > - if (rte_eal_memory_init() < 0) > - rte_panic("Cannot init memory\n"); > + if (rte_eal_memory_init() < 0) { > + RTE_LOG (ERR, EAL, "Cannot init memory\n"); > + return -1; > + } [...] Something like that, sure. The big question with this conversion is what to do with already allocated/initialized resources in case of failure, which I'd guess is the reason rte_panic() is there - to avoid having to deal with all that. Getting to a point where all or even most inialization can be undone in case of failure is likely going to be a long road, I think many subsystems dont even have a shutdown function. To beging with, EAL itself doesn't have one :) Anyway, one has to start someplace. But in order to make the cleanup eventually possible, I'd suggest using a common point of exit instead of a dozen returns, ie something in spirit of { [...] if (rte_eal_pci_init() < 0) { RTE_LOG (ERR, EAL, "Cannot init PCI\n"); goto err; } if (rte_eal_memory_init() < 0) RTE_LOG (ERR, EAL, "Cannot init memory\n"); goto err; } [...] return 0; err: /* TODO: undo all initialization work */ return -1; } - Panu -
[dpdk-dev] [PATCH v6 00/13] vhost-user multiple queues enabling
On 10/09/2015 08:45 AM, Yuanhan Liu wrote: > This patch set enables vhost-user multiple queues. > > Overview > > > It depends on some QEMU patches that has already been merged to upstream. > Those qemu patches introduce some new vhost-user messages, for vhost-user > mq enabling negotiation. Here is the main negotiation steps (Qemu > as master, and DPDK vhost-user as slave): > > - Master queries features by VHOST_USER_GET_FEATURES from slave > > - Check if VHOST_USER_F_PROTOCOL_FEATURES exist. If not, mq is not >supported. (check patch 1 for why VHOST_USER_F_PROTOCOL_FEATURES >is introduced) > > - Master then sends another command, VHOST_USER_GET_QUEUE_NUM, for >querying how many queues the slave supports. > >Master will compare the result with the requested queue number. >Qemu exits if the former is smaller. > > - Master then tries to initiate all queue pairs by sending some vhost >user commands, including VHOST_USER_SET_VRING_CALL, which will >trigger the slave to do related vring setup, such as vring allocation. > > > Till now, all necessary initiation and negotiation are done. And master > could send another message, VHOST_USER_SET_VRING_ENABLE, to enable/disable > a specific queue dynamically later. > > > Patchset > > > Patch 1-6 are all prepare works for enabling mq; they are all atomic > changes, with "do not breaking anything" beared in mind while making > them. > > Patch 7 acutally enables mq feature, by setting two key feature flags. > > Patch 8 handles VHOST_USER_SET_VRING_ENABLE message, which is for enabling > disabling a specific virt queue pair, and there is only one queue pair is > enabled by default. > > Patch 9-12 is for demostrating the mq feature. > > Patch 13 udpates the doc release note. > > > Testing > === > > Host side > -- > > - # Start vhost-switch > >sudo mount -t hugetlbfs nodev /mnt/huge >sudo modprobe uio >sudo insmod $RTE_SDK/$RTE_TARGET/kmod/igb_uio.ko > >sudo $RTE_SDK/tools/dpdk_nic_bind.py --bind igb_uio :08:00.0 > >sudo $RTE_SDK/examples/vhost/build/vhost-switch -c 0xf0 -n 4 \ > --huge-dir /mnt/huge --socket-mem 2048,0 -- -p 1 --vm2vm 0 \ > --dev-basename usvhost --rxq 2 > ># Above common generates a usvhost socket file at PWD. You could also ># specify "--stats 1" option to enable stats dumping. > > > > - # start qemu > > >sudo sudo mount -t hugetlbfs nodev $HOME/hugetlbfs >$QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -machine accel=kvm -m 4G \ > -object > memory-backend-file,id=mem,size=4G,mem-path=$HOME/hugetlbfs,share=on \ > -numa node,memdev=mem -chardev socket,id=chr0,path=/path/to/usvhost \ > -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=2 \ > -device > virtio-net-pci,netdev=net0,mq=on,vectors=6,mac=52:54:00:12:34:58,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off > \ > -hda $HOME/iso/fc-22-x86_64.img -smp 10 -cpu > core2duo,+sse3,+sse4.1,+sse4.2 > > > Guest side > -- > > modprobe uio > insmod $RTE_SDK/$RTE_TARGET/kmod/igb_uio.ko > echo 1024 > > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages > ./tools/dpdk_nic_bind.py --bind igb_uio 00:03.0 > > $RTE_SDK/$RTE_TARGET/app/testpmd -c 1f -n 4 -- --rxq=2 --txq=2 \ > --nb-cores=4 -i --disable-hw-vlan --txqflags 0xf00 > > > set fwd mac > > start tx_first > > > After those setups, you then could use packet generator for packet tx/rx > testing. > > > > Test with OVS > = > > Marcel also created a simple yet quite clear test guide with OVS at: > > http://wiki.qemu.org/Features/vhost-user-ovs-dpdk > > BTW, Marcel, would you please complete the page on mq testing? Hi, No problem, I'll be happy to add it. I was waiting for this submission and hoping for an OSV re-submission of the counter-patch. I need at least a public link to all working series, otherwise the WIKI would not be as simple and people will not use it :) Thank you for the fantastic work! Marcel > > > --- > Changchun Ouyang (7): >vhost: rxtx: prepare work for multiple queue support >virtio: read virtio_net_config correctly >vhost: add VHOST_USER_SET_VRING_ENABLE message >vhost: add API bind a virtq to a specific core >ixgbe: support VMDq RSS in non-SRIOV environment >examples/vhost: demonstrate the usage of vhost mq feature >examples/vhost: add per queue stats > > Yuanhan Liu (6): >vhost-user: add protocol features support >vhost-user: add VHOST_USER_GET_QUEUE_NUM message >vhost: vring queue setup for multiple queue support >vhost-user: handle VHOST_USER_RESET_OWNER correctly >vhost-user: enable vhost-user multiple queue >doc: update release note for vhost-user mq support > > doc/guides/rel_notes/release_2_2.rst | 5 + > drivers/net/ixgbe/ixgbe_rxtx.c| 86 +- > drivers/net/virtio/virtio_ethdev.c| 1
[dpdk-dev] Accurate timestamps in received packets
Hi all, I'm using rte_eth_rx_burst() to successfully retrieve packets from a DPDK-enabled port. I can process the packet and everything works fine. My only issue is that I cannot find any mean to retrieve a timestamp for every single packet. As a dirty-workaround I'm using gettimeofday() to timestamp incoming packets but I would rather like to retrieve a more accurate and realistic timestamp from the Ethernet PHY layer instead. For example if I receive 32 packets in a single burst I'm just assigning the packets timestamp with 1ns of difference (using gettimeofday() for the initial time offset). Is there a way to retrieve a realistic timestamp from the Ethernet PHY layer? I found this patch searching on the web: http://www.wand.net.nz/trac/libtrace/browser/Intel%20DPDK%20Patches/hardware_timestamp.patch that is however related to an older DPDK version and works only for INTEL 82580 controllers... do you know if that simple patch linked above could be similarly ported to Intel 82599 and 82571 controllers? Is there any better/easier way to do that? Thanks a lot, Francesco Montorsi
[dpdk-dev] rte_eal_init() alternative?
On 10/9/15 11:40 AM, Panu Matilainen wrote: > On 10/09/2015 01:03 PM, Montorsi, Francesco wrote: >> Hi Panu, >> >> >> >>> -Original Message- >>> From: Panu Matilainen [mailto:pmatilai at redhat.com] >>> Sent: venerd? 9 ottobre 2015 10:26 >>> To: Montorsi, Francesco ; Thomas Monjalon >>> >>> Cc: dev at dpdk.org >>> Subject: Re: [dpdk-dev] rte_eal_init() alternative? >>> Something like the attached patch. >>> >>> It seems the patch missed the boat :) >> >> Correct, sorry. I'm attaching it now. >> >> >>> Note that the attached patch exposes also a way to skip the argv/argc configuration process by directly providing a populated configuration structure... Let me know what you think about it (the patch is just a draft and needs more work). >>> >>> Can't comment on what I've not seen, but based on comments seen on this >>> list, having an alternative way to initialize with structures would >>> be welcomed >>> by many. The downside is that those structures will need to be >>> exposed in >>> the API forever which means any changes there are subject to the ABI >>> process. >>> >> Perhaps the init function taking a structure could be an exception >> for ABI changes... i.e., the format of the configuration is not >> garantueed to stay the same between different versions, and >> applications using a shared build of DPDK libraries must avoid using >> the configuration structure... would that be a possible solution? > > Sorry but no, down the path of exceptions lies madness. It'd also be > giving the middle finger to people using DPDK as a shared library. > > Exported structs are always a PITA and even more so in something like > configuration which is expected to keep expanding and/or otherwise > changing. > > I'd much rather see an rte_eal_init() which takes struct *rte_cfgfile > as the configuration argument. That, plus maybe enhance librte_cfgfile > to allow constructing one entirely in memory + setting values in > addition to getting. > > - Panu - It is very difficult for application writers to write their own command parsers with implementation for -h option. How about a function that would verify the init parameters and return with a benign error if the options are not correct. > > > > >> Thanks, >> Francesco >> >> >> > -- Thomas F Herbert Red Hat
[dpdk-dev] Mellanox PMD failure w/DPDK-2.1.0 and MLNX_OFED-3.1-1.0.3
Hi Olga Thanks for the pointer towards the use of "accelerated verbs". Yes, SRIOV is enabled, dpdk on the hypervisor on the probed VFs. That said, it also fails on the underlying PF as far as I see (e.g. below the log shows (VF: false) for device mlx4_0 and the code fails in RD creation on this as well as on one of the VFs). I don't see any messages generated in dmesg that seem to indicate errors at any point, but extract included below. But here's perhaps the crux! Switching off sriov and running with the new combination of dpdk and ofed against just a single PF also fails in exactly the same way (RD creation failure). The old code continues to work. I will audit our code to make sure we're not missing something when using dpdk-2.1. In the meantime, do you have a minimal test that involves RD creation? thanks bill // DPDK output for application run using dpdk-2.1 and ofed 3.1 EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 1 on socket 0 EAL: Detected lcore 2 as core 2 on socket 0 EAL: Detected lcore 3 as core 3 on socket 0 EAL: Detected lcore 4 as core 4 on socket 0 EAL: Detected lcore 5 as core 5 on socket 0 EAL: Detected lcore 6 as core 0 on socket 0 EAL: Detected lcore 7 as core 1 on socket 0 EAL: Detected lcore 8 as core 2 on socket 0 EAL: Detected lcore 9 as core 3 on socket 0 EAL: Detected lcore 10 as core 4 on socket 0 EAL: Detected lcore 11 as core 5 on socket 0 EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 12 lcore(s) EAL: VFIO modules not all loaded, skip VFIO support... EAL: Setting up physically contiguous memory... EAL: Ask a virtual area of 0xe40 bytes EAL: Virtual area found at 0x7fffe600 (size = 0xe40) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fffe5c0 (size = 0x20) EAL: Ask a virtual area of 0x7180 bytes EAL: Virtual area found at 0x7fff7420 (size = 0x7180) EAL: Ask a virtual area of 0x20 bytes EAL: Virtual area found at 0x7fff73e0 (size = 0x20) EAL: Requesting 512 pages of size 2MB from socket 0 EAL: TSC frequency is ~2394453 KHz EAL: Master lcore 0 is ready (tid=f7fe7940;cpuset=[0]) EAL: lcore 1 is ready (tid=e53fe700;cpuset=[1]) EAL: lcore 2 is ready (tid=e4bfd700;cpuset=[2]) EAL: lcore 3 is ready (tid=e43fc700;cpuset=[3]) EAL: PCI device :01:00.0 on NUMA socket 0 EAL: probe driver: 15b3:1003 librte_pmd_mlx4 PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_0" (VF: false) PMD: librte_pmd_mlx4: 1 port(s) detected PMD: librte_pmd_mlx4: port 1 MAC address is f4:52:14:8f:16:80 EAL: PCI device :01:00.1 on NUMA socket 0 EAL: probe driver: 15b3:1004 librte_pmd_mlx4 PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_1" (VF: true) PMD: librte_pmd_mlx4: 1 port(s) detected PMD: librte_pmd_mlx4: port 1 MAC address is b2:00:7c:2b:3f:47 EAL: PCI device :01:00.2 on NUMA socket 0 EAL: probe driver: 15b3:1004 librte_pmd_mlx4 PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_2" (VF: true) PMD: librte_pmd_mlx4: 1 port(s) detected PMD: librte_pmd_mlx4: port 1 MAC address is 3a:3d:c7:e0:ed:5a EAL: PCI device :01:00.3 on NUMA socket 0 EAL: probe driver: 15b3:1004 librte_pmd_mlx4 PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_3" (VF: true) PMD: librte_pmd_mlx4: 1 port(s) detected PMD: librte_pmd_mlx4: port 1 MAC address is ee:6a:a6:79:24:4c EAL: PCI device :01:00.4 on NUMA socket 0 EAL: probe driver: 15b3:1004 librte_pmd_mlx4 PMD: librte_pmd_mlx4: PCI information matches, using device "mlx4_4" (VF: true) PMD: librte_pmd_mlx4: 1 port(s) detected PMD: librte_pmd_mlx4: port 1 MAC address is 8a:7a:30:00:46:33 EAL: PCI device :01:00.5 on NUMA socket 0 EAL: probe driver: 15b3:1004 librte_pmd_mlx4 PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded? EAL: PCI device :01:00.6 on NUMA socket 0 EAL: probe driver: 15b3:1004 librte_pmd_mlx4 PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded? EAL: PCI device :01:00.7 on NUMA socket 0 EAL: probe driver: 15b3:1004 librte_pmd_mlx4 PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded? EAL: PCI device :01:01.0 on NUMA socket 0 EAL: probe driver: 15b3:1004 librte_pmd_mlx4 PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded? EAL: PCI device :01:01.1 on NUMA socket 0 EAL: probe driver: 15b3:1004 librte_pmd_mlx4 PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded? EAL: PCI device :01:01.2 on NUMA socket 0 EAL: probe driver: 15b3:1004 librte_pmd_mlx4 PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded? EAL: PCI device :01:01.3 on NUMA socket 0 EAL: probe driver: 15b3:1004 librte_pmd_mlx4 PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded? EAL: PCI device :01:01.4 on NUMA socket 0 EAL: probe driver: 15b3:1004 librte_pmd_mlx4 PMD: librte_pmd_mlx4: cannot access device, is mlx4_ib loaded? EAL: PCI device :01:01.5 on NUMA socket 0 EAL: p
[dpdk-dev] Network Stack discussion notes from 2015 DPDK Userspace
Here are some notes from the DPDK Network Stack discussion, I can remember please help me fill in anything I missed. Items I remember we talked about: * The only reason for a DPDK TCP/IP stack is for performance and possibly lower latency * Meaning the developer is willing to re-write or write his application to get the best performance. * A TCP/IPv4/v6 stack is the minimum stack we need to support applications linked with DPDK. * SCTP is also another protocol that maybe required * TCP is the primary protocol, usage model for most use cases * Stack must be able to terminate TCP traffic to an application linked to DPDK * For DPDK the customer is looking for fast applications and is willing to write the application just for DPDK network stack * Converting an existing application could be done, but the design is for performance and may require a lot of changes to an application * Using an application API that is not Socket is fine for high performance and maybe the only way we get best performance. * Need to supply a Socket layer interface as a option if customer is willing to take a performance hit instead of rewriting the application * Native application acceleration is desired, but not required when using DPDK network stack * We have two projects related to network stack in DPDK * The first one is porting some TCP/IP stack to DPDK plus it needs to give a reasonable performance increase over native Linux applications * The stack code needs to be BSD/MIT like licensed (Open Sourced) * The stack should be up to date with the latest RFCs or at least close * A stack could be written for DPDK (not using a existing code base) and its environment for best performance * Need to be able to configure the DPDK stack(s) from the Linux command line tools if possible * Need a DPDK specific application layer API for application to interface with the network stack * Could have a socket layer API on top of the specific API for applications needing to use sockets (not expected to be the best performance) * The second item is figuring out a new IPC for East/West traffic within the same system. * The design needs to improve performance between applications and be transparent to the application when the remote end is not on the same system. * The new IPC path should be agnostic to local or remote end points * Needs to be very fast compared to current Linux IPC designs. (Will OVS work here?) Did I miss any details or comments, please reply and help me correct the comment or understanding. Thanks for everyone attending and packing into a small space. ? Regards, ++Keith Wiles Intel Corporation