[dpdk-dev] [PATCH] eventdev: change port_id to uint16_t
From: yao >From 17.11, port_id is changed from uint8_t to uint16_t.But in eventdev, it still use the old fashion. This patch fix this issue. Signed-off-by: Lei Yao --- lib/librte_eventdev/rte_event_eth_rx_adapter.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.c b/lib/librte_eventdev/rte_event_eth_rx_adapter.c index aec2703..c059841 100644 --- a/lib/librte_eventdev/rte_event_eth_rx_adapter.c +++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.c @@ -186,7 +186,7 @@ wrr_next(struct rte_event_eth_rx_adapter *rx_adapter, static int eth_poll_wrr_calc(struct rte_event_eth_rx_adapter *rx_adapter) { - uint8_t d; + uint16_t d; uint16_t q; unsigned int i; @@ -855,7 +855,7 @@ rte_event_eth_rx_adapter_create_ext(uint8_t id, uint8_t dev_id, struct rte_event_eth_rx_adapter *rx_adapter; int ret; int socket_id; - uint8_t i; + uint16_t i; char mem_name[ETH_RX_ADAPTER_SERVICE_NAME_LEN]; const uint8_t default_rss_key[] = { 0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2, -- 2.7.4
net_af_xdp pmd memory leak
My test reports memory leaks for receive path, looking at the code, it seems the following allocations were never released, should they? Thanks. https://github.com/DPDK/dpdk/blob/903ec2b1b49e496815c016b0104fd655cd972661/drivers/net/af_xdp/rte_eth_af_xdp.c#L312
[dpdk-dev] [PATCH] net: support PPPOE in software packet type parser
Add a new RTE_PTYPE_L2_ETHER_PPPOE and its support in rte_net_get_ptype() Signed-off-by: Ray Zhang --- lib/librte_mbuf/rte_mbuf_ptype.h | 7 +++ lib/librte_net/rte_ether.h | 12 lib/librte_net/rte_net.c | 19 +++ 3 files changed, 38 insertions(+) diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h b/lib/librte_mbuf/rte_mbuf_ptype.h index ff6de9d17..7dd03de9e 100644 --- a/lib/librte_mbuf/rte_mbuf_ptype.h +++ b/lib/librte_mbuf/rte_mbuf_ptype.h @@ -150,6 +150,13 @@ extern "C" { */ #define RTE_PTYPE_L2_ETHER_QINQ 0x0007 /** + * PPPOE packet type. + * + * Packet format: + * <'ether type'=[0x8864]> + */ +#define RTE_PTYPE_L2_ETHER_PPPOE0x0008 +/** * Mask of layer 2 packet types. * It is used for outer packet for tunneling cases. */ diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h index ff3d06540..d76edb368 100644 --- a/lib/librte_net/rte_ether.h +++ b/lib/librte_net/rte_ether.h @@ -323,12 +323,24 @@ struct vxlan_hdr { uint32_t vx_vni; /**< VNI (24) + Reserved (8). */ } __attribute__((__packed__)); +/** + * PPPOE protocol header + */ +struct pppoe_hdr { + uint8_t type_ver; + uint8_t code; + uint16_t sid; + uint16_t length; + uint16_t proto; +} __attribute__((packed)); + /* Ethernet frame types */ #define ETHER_TYPE_IPv4 0x0800 /**< IPv4 Protocol. */ #define ETHER_TYPE_IPv6 0x86DD /**< IPv6 Protocol. */ #define ETHER_TYPE_ARP 0x0806 /**< Arp Protocol. */ #define ETHER_TYPE_RARP 0x8035 /**< Reverse Arp Protocol. */ #define ETHER_TYPE_VLAN 0x8100 /**< IEEE 802.1Q VLAN tagging. */ +#define ETHER_TYPE_PPPOE 0x8864 /**< PPPoE Protocol */ #define ETHER_TYPE_QINQ 0x88A8 /**< IEEE 802.1ad QinQ tagging. */ #define ETHER_TYPE_1588 0x88F7 /**< IEEE 802.1AS 1588 Precise Time Protocol. */ #define ETHER_TYPE_SLOW 0x8809 /**< Slow protocols (LACP and Marker). */ diff --git a/lib/librte_net/rte_net.c b/lib/librte_net/rte_net.c index a8c7aff9c..439c2f6e6 100644 --- a/lib/librte_net/rte_net.c +++ b/lib/librte_net/rte_net.c @@ -302,6 +302,25 @@ uint32_t rte_net_get_ptype(const struct rte_mbuf *m, off += 2 * sizeof(*vh); hdr_lens->l2_len += 2 * sizeof(*vh); proto = vh->eth_proto; + } else if (proto == rte_cpu_to_be_16(ETHER_TYPE_PPPOE)) { + const struct pppoe_hdr *ph; + struct pppoe_hdr ph_copy; + + pkt_type = RTE_PTYPE_L2_ETHER_PPPOE; + ph = rte_pktmbuf_read(m, off, sizeof(*ph), &ph_copy); + if (unlikely(ph == NULL)) + return pkt_type; + + off += sizeof(*ph); + hdr_lens->l2_len += sizeof(*ph); + if (ph->code != 0) /* Not Seesion Data */ + return pkt_type; + if (ph->proto == rte_cpu_to_be_16(0x21)) + proto = rte_cpu_to_be_16(ETHER_TYPE_IPv4); + else if (ph->proto == rte_cpu_to_be_16(0x57)) + proto = rte_cpu_to_be_16(ETHER_TYPE_IPv6); + else + return pkt_type; } l3: -- 2.12.0.189.g3bc53220c
[dpdk-dev] [PATCH] net: support PPPOE in software packet type parser
Add a new RTE_PTYPE_L2_ETHER_PPPOE and its support in rte_net_get_ptype() Signed-off-by: Ray Zhang --- lib/librte_mbuf/rte_mbuf_ptype.h | 7 +++ lib/librte_net/rte_ether.h | 12 lib/librte_net/rte_net.c | 19 +++ 3 files changed, 38 insertions(+) diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h b/lib/librte_mbuf/rte_mbuf_ptype.h index ff6de9d..7dd03de 100644 --- a/lib/librte_mbuf/rte_mbuf_ptype.h +++ b/lib/librte_mbuf/rte_mbuf_ptype.h @@ -150,6 +150,13 @@ */ #define RTE_PTYPE_L2_ETHER_QINQ 0x0007 /** + * PPPOE packet type. + * + * Packet format: + * <'ether type'=[0x8864]> + */ +#define RTE_PTYPE_L2_ETHER_PPPOE0x0008 +/** * Mask of layer 2 packet types. * It is used for outer packet for tunneling cases. */ diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h index ff3d065..d76edb3 100644 --- a/lib/librte_net/rte_ether.h +++ b/lib/librte_net/rte_ether.h @@ -323,12 +323,24 @@ struct vxlan_hdr { uint32_t vx_vni; /**< VNI (24) + Reserved (8). */ } __attribute__((__packed__)); +/** + * PPPOE protocol header + */ +struct pppoe_hdr { + uint8_t type_ver; + uint8_t code; + uint16_t sid; + uint16_t length; + uint16_t proto; +} __attribute__((packed)); + /* Ethernet frame types */ #define ETHER_TYPE_IPv4 0x0800 /**< IPv4 Protocol. */ #define ETHER_TYPE_IPv6 0x86DD /**< IPv6 Protocol. */ #define ETHER_TYPE_ARP 0x0806 /**< Arp Protocol. */ #define ETHER_TYPE_RARP 0x8035 /**< Reverse Arp Protocol. */ #define ETHER_TYPE_VLAN 0x8100 /**< IEEE 802.1Q VLAN tagging. */ +#define ETHER_TYPE_PPPOE 0x8864 /**< PPPoE Protocol */ #define ETHER_TYPE_QINQ 0x88A8 /**< IEEE 802.1ad QinQ tagging. */ #define ETHER_TYPE_1588 0x88F7 /**< IEEE 802.1AS 1588 Precise Time Protocol. */ #define ETHER_TYPE_SLOW 0x8809 /**< Slow protocols (LACP and Marker). */ diff --git a/lib/librte_net/rte_net.c b/lib/librte_net/rte_net.c index a8c7aff..439c2f6 100644 --- a/lib/librte_net/rte_net.c +++ b/lib/librte_net/rte_net.c @@ -302,6 +302,25 @@ uint32_t rte_net_get_ptype(const struct rte_mbuf *m, off += 2 * sizeof(*vh); hdr_lens->l2_len += 2 * sizeof(*vh); proto = vh->eth_proto; + } else if (proto == rte_cpu_to_be_16(ETHER_TYPE_PPPOE)) { + const struct pppoe_hdr *ph; + struct pppoe_hdr ph_copy; + + pkt_type = RTE_PTYPE_L2_ETHER_PPPOE; + ph = rte_pktmbuf_read(m, off, sizeof(*ph), &ph_copy); + if (unlikely(ph == NULL)) + return pkt_type; + + off += sizeof(*ph); + hdr_lens->l2_len += sizeof(*ph); + if (ph->code != 0) /* Not Seesion Data */ + return pkt_type; + if (ph->proto == rte_cpu_to_be_16(0x21)) + proto = rte_cpu_to_be_16(ETHER_TYPE_IPv4); + else if (ph->proto == rte_cpu_to_be_16(0x57)) + proto = rte_cpu_to_be_16(ETHER_TYPE_IPv6); + else + return pkt_type; } l3: -- 1.9.1
Re: [dpdk-dev] [PATCH v3 00/10] rxq interrupt mode for virtio PMD
Tested-by: Lei Yao Apply patch to dpdk_next_virtio branch. Qemu version: 2.5.0 Kernel version in VM: 4.8.1 Following TCs are tested and passed: Test Case1: Basic Virtio Interrupt test Test Case2: Interrupted received in VM with different Virtio version(0.95 and 1.0) Test Case3: Interrupted by packet data on all queue Test Case4: Interrupted by packet data on unique queue Test Case5: Stop packet transmit, related cores will be back to sleep > -Original Message- > From: Tan, Jianfeng > Sent: Monday, January 16, 2017 10:47 PM > To: dev@dpdk.org > Cc: yuanhan@linux.intel.com; step...@networkplumber.org; Yao, Lei A > ; Tan, Jianfeng > Subject: [PATCH v3 00/10] rxq interrupt mode for virtio PMD > > v3: > - Update documents: > * doc/guides/nics/features/virtio.ini > * doc/guides/nics/features/virtio_vec.ini > * doc/guides/nics/virtio.rst > - Use hw->max_queue_pairs instead of dev->data->nb_rx_queues to > allocate intr_vec array. > - Fix v2 not working on legacy virtio devices by moving msix enabling > before queue/irq binding. > - Reword cover letter to give an overview of this series. > - Remove wrapper to call vtpci->set_config_irq and vtpci->set_queue_irq. > - Rebase on the new code, and fix a bug after changes by the commit > bb30369dc10("eal: allow passing const interrupt handle"). Basically, > it changes the way to get max interrupts. And we need to re-register > callback to update intr_handle->max_intr. > - In l3fwd-power ptype fix, use rte_eth_dev_get_supported_ptypes() to > query if PMD provides needed ptypes. > > v2: > - Add PCI queue/irq config ops. > - Move rxq interrupt settings before sending DRIVER OK. > > Historically, virtio PMD can only be binded to igb_uio or > uio_pci_generic, and not for vfio-pci (iommu group cannot be created as > vIOMMU is not enabled in QEMU yet). Besides, quote from > http://dpdk.org/doc/guides-16.11/rel_notes/release_2_1.html: > "Per queue RX interrupt events are only allowed in VFIO >which supports multiple MSI-X vectors." > > Linux starts to support VFIO NO-IOMMU mode since 4.8.0. It cannot put > devices into groups for separation as normal VFIO does. So it does not > require QEMU to support vIOMMU. But it does inherit other benefits from > VFIO framework, like better interrupts supports (than UIO). It gives a > good chance to enable rxq interrupt for virtio PMD. > > To implement it, > a. Firstly, we need to enable msix. This should be done before DRIVER_OK > setting and also before queue/irq binding in step b. > b. Bind queue/irq through portio (legacy devices) or mmio (modern devices). >So far, we hard-code 1:1 queue/irq mapping (each rx queue has one >exclusive interrupt), like this: > vec 0 -> config irq > vec 1 -> rxq0 > vec 2 -> rxq1 > ... > > which means, the "vectors" option of QEMU should be configured with > a value >= N+1 (N is the number of the queue pairs). > c. To enable/disable interrupt notification, flags on virtqueues are used >to control devices either sending interrupt or not. > d. Encap above behaviors into callbacks in ether_dev_ops, like >rx_queue_intr_enable/rx_queue_intr_disable/rx_descriptor_done etc. > > > How to test: > > Step 1, prepare a VM image with kernel version >= 4.8.0, and make sure > the kernel is compiled with CONFIG_VFIO_NOIOMMU=y. > > Step 2, on the host, start a testpmd with a vhost port: > $ testpmd -c 0x7 -m 1024 --vdev 'eth_vhost0,iface=/tmp/sock0,queues=2' \ > --no-pci -- -i --rxq=2 --txq=2 --nb-cores=2 > > Step 3, boot the VM: > $ qemu ... -chardev socket,id=chr1,path=/tmp/sock0 \ > -netdev vhost-user,id=net1,chardev=chr1,vhostforce,queues=2 \ > -device virtio-net-pci,netdev=net1,mq=on,vectors=5 ... > > Step 4, insert kernel modules > $ modprobe vfio enable_unsafe_noiommu_mode=1 > $ modprobe vfio-pci > > Step 5, start l3fwd-power in VM: > $ l3fwd-power -c 0x3 -n 4 -- -p 1 -P --config="(0,0,1),(0,1,1)" \ >--no-numa --parse-ptype > > Step 6, send packets from testpmd on the host: > $ start tx_first > > Then l3fwd-power outputs: > L3FWD_POWER: lcore 1 is waked up from rx interrupt on port 0 queue 0 > L3FWD_POWER: lcore 1 is waked up from rx interrupt on port 0 queue 1 > > Signed-off-by: Jianfeng Tan > > Jianfeng Tan (10): > net/virtio: fix rewriting LSC flag > net/virtio: clean up wrapper of set_config_irq > net/virtio: add Rx descriptor check > net/virtio: add PCI ops for queue/irq binding > net/virtio: add Rx queue
Re: [dpdk-dev] [PATCH v2] eal: optimize aligned rte_memcpy
Tested-by: Lei Yao - Apply patch to v16.11 I have tested the loopback performance for this patch on 3 following settings: CPU: IVB Ubutnu16.04 Kernal: 4.4.0 gcc : 5.4.0 CPU: HSW Fedora 21 Kernal: 4.1.13 gcc: 4.9.2 CPU:BDW Ubutnu16.04 Kernal: 4.4.0 gcc : 5.4.0 I can see 10%~20% performance gain for different packet size on mergeable path. Only on IVB + gcc5.4.0, slight performance drop(~4%) on vector path for packet size 128 ,260. It's may related to gcc version as this performance drop not see with gcc 6+. -Original Message- From: Wang, Zhihong Sent: Wednesday, December 7, 2016 9:31 AM To: dev@dpdk.org Cc: yuanhan@linux.intel.com; thomas.monja...@6wind.com; Yao, Lei A ; Wang, Zhihong Subject: [PATCH v2] eal: optimize aligned rte_memcpy This patch optimizes rte_memcpy for well aligned cases, where both dst and src addr are aligned to maximum MOV width. It introduces a dedicated function called rte_memcpy_aligned to handle the aligned cases with simplified instruction stream. The existing rte_memcpy is renamed as rte_memcpy_generic. The selection between them 2 is done at the entry of rte_memcpy. The existing rte_memcpy is for generic cases, it handles unaligned copies and make store aligned, it even makes load aligned for micro architectures like Ivy Bridge. However alignment handling comes at a price: It adds extra load/store instructions, which can cause complications sometime. DPDK Vhost memcpy with Mergeable Rx Buffer feature as an example: The copy is aligned, and remote, and there is header write along which is also remote. In this case the memcpy instruction stream should be simplified, to reduce extra load/store, therefore reduce the probability of load/store buffer full caused pipeline stall, to let the actual memcpy instructions be issued and let H/W prefetcher goes to work as early as possible. This patch is tested on Ivy Bridge, Haswell and Skylake, it provides up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging from 64 to 1500 bytes. The test can also be conducted without NIC, by setting loopback traffic between Virtio and Vhost. For example, modify the macro TXONLY_DEF_PACKET_LEN to the requested packet size in testpmd.h, rebuild and start testpmd in both host and guest, then "start" on one side and "start tx_first 32" on the other. Signed-off-by: Zhihong Wang --- .../common/include/arch/x86/rte_memcpy.h | 81 +- 1 file changed, 78 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h index b3bfc23..b9785e8 100644 --- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h +++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h @@ -69,6 +69,8 @@ rte_memcpy(void *dst, const void *src, size_t n) __attribute__((always_inline)); #ifdef RTE_MACHINE_CPUFLAG_AVX512F +#define ALIGNMENT_MASK 0x3F + /** * AVX512 implementation below */ @@ -189,7 +191,7 @@ rte_mov512blocks(uint8_t *dst, const uint8_t *src, size_t n) } static inline void * -rte_memcpy(void *dst, const void *src, size_t n) +rte_memcpy_generic(void *dst, const void *src, size_t n) { uintptr_t dstu = (uintptr_t)dst; uintptr_t srcu = (uintptr_t)src; @@ -308,6 +310,8 @@ COPY_BLOCK_128_BACK63: #elif defined RTE_MACHINE_CPUFLAG_AVX2 +#define ALIGNMENT_MASK 0x1F + /** * AVX2 implementation below */ @@ -387,7 +391,7 @@ rte_mov128blocks(uint8_t *dst, const uint8_t *src, size_t n) } static inline void * -rte_memcpy(void *dst, const void *src, size_t n) +rte_memcpy_generic(void *dst, const void *src, size_t n) { uintptr_t dstu = (uintptr_t)dst; uintptr_t srcu = (uintptr_t)src; @@ -499,6 +503,8 @@ COPY_BLOCK_128_BACK31: #else /* RTE_MACHINE_CPUFLAG */ +#define ALIGNMENT_MASK 0x0F + /** * SSE & AVX implementation below */ @@ -677,7 +683,7 @@ __extension__ ({ \ }) static inline void * -rte_memcpy(void *dst, const void *src, size_t n) +rte_memcpy_generic(void *dst, const void *src, size_t n) { __m128i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8; uintptr_t dstu = (uintptr_t)dst; @@ -821,6 +827,75 @@ COPY_BLOCK_64_BACK15: #endif /* RTE_MACHINE_CPUFLAG */ +static inline void * +rte_memcpy_aligned(void *dst, const void *src, size_t n) +{ + void *ret = dst; + + /* Copy size <= 16 bytes */ + if (n < 16) { + if (n & 0x01) { + *(uint8_t *)dst = *(const uint8_t *)src; + src = (const uint8_t *)src + 1; + dst = (uint8_t *)dst + 1; + } + if (n & 0x02) { + *(uint16_t *)dst = *(const uint16_t *)src; + src = (const uint16_t *)src + 1; + dst = (uint16_t *)dst + 1; +
Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform
> On Fri, Dec 16, 2016 at 10:19:43AM +, Yang, Zhiyong wrote: > > > > I run the same virtio/vhost loopback tests without NIC. > > > > I can see the throughput drop when running choosing functions at run > > > > time compared to original code as following on the same platform(my > > > machine is haswell) > > > > Packet size perf drop > > > > 64 -4% > > > > 256 -5.4% > > > > 1024-5% > > > > 1500-2.5% > > > > Another thing, I run the memcpy_perf_autotest, when N= <128, the > > > > rte_memcpy perf gains almost disappears When choosing functions at > run > > > > time. For N=other numbers, the perf gains will become narrow. > > > > > > > How narrow. How significant is the improvement that we gain from > having to > > > maintain our own copy of memcpy. If the libc version is nearly as good we > > > should just use that. > > > > > > /Bruce > > > > Zhihong sent a patch about rte_memcpy, From the patch, > > we can see the optimization job for memcpy will bring obvious perf > improvements > > than glibc for DPDK. > > Just a clarification: it's better than the __original DPDK__ rte_memcpy > but not the glibc one. That makes me think have any one tested the memcpy > with big packets? Does the one from DPDK outweigh the one from glibc, > even for big packets? > > --yliu > I have test the loopback performanc rte_memcpy and glibc memcpy. For both small packer and Big packet, rte_memcpy has better performance. My test enviromen is following CPU: BDW Ubutnu16.04 Kernal: 4.4.0 gcc : 5.4.0 Path: mergeable Size rte_memcpy performance gain 64 31% 128 35% 260 27% 520 33% 1024 18% 1500 12% --Lei > > http://www.dpdk.org/dev/patchwork/patch/17753/ > > git log as following: > > This patch is tested on Ivy Bridge, Haswell and Skylake, it provides > > up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging > > from 64 to 1500 bytes. > > > > thanks > > Zhiyong
Re: [dpdk-dev] [PATCH v14 3/3] app/testpmd: enable TCP/IPv4 GRO
> -Original Message- > From: Hu, Jiayu > Sent: Sunday, July 9, 2017 1:47 PM > To: dev@dpdk.org > Cc: Tan, Jianfeng ; Ananyev, Konstantin > ; y...@fridaylinux.org; > step...@networkplumber.org; Wu, Jingjing ; Yao, > Lei A ; Hu, Jiayu > Subject: [PATCH v14 3/3] app/testpmd: enable TCP/IPv4 GRO > > This patch enables TCP/IPv4 GRO library in csum forwarding engine. > By default, GRO is turned off. Users can use command "gro (on|off) > (port_id)" to enable or disable GRO for a given port. If a port is > enabled GRO, all TCP/IPv4 packets received from the port are performed > GRO. Besides, users can set max flow number and packets number per-flow > by command "gro set (max_flow_num) (max_item_num_per_flow) > (port_id)". > > Signed-off-by: Jiayu Hu Tested-by: Lei Yao This patch has been verified on Haswell bench for basic functions and Performance. > --- > app/test-pmd/cmdline.c | 125 > > app/test-pmd/config.c | 36 > app/test-pmd/csumonly.c | 5 ++ > app/test-pmd/testpmd.c | 3 + > app/test-pmd/testpmd.h | 10 +++ > doc/guides/testpmd_app_ug/testpmd_funcs.rst | 34 > 6 files changed, 213 insertions(+) > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c > index d66e9c8..d4ff608 100644 > --- a/app/test-pmd/cmdline.c > +++ b/app/test-pmd/cmdline.c > @@ -76,6 +76,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -423,6 +424,14 @@ static void cmd_help_long_parsed(void > *parsed_result, > "tso show (portid)" > "Display the status of TCP Segmentation > Offload.\n\n" > > + "gro (on|off) (port_id)" > + "Enable or disable Generic Receive Offload in io" > + " forward engine.\n\n" > + > + "gro set (max_flow_num) > (max_item_num_per_flow) (port_id)\n" > + "Set max flow number and max packet number > per-flow" > + " for GRO.\n\n" > + > "set fwd (%s)\n" > "Set packet forwarding mode.\n\n" > > @@ -3838,6 +3847,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = { > }, > }; > > +/* *** SET GRO FOR A PORT *** */ > +struct cmd_gro_result { > + cmdline_fixed_string_t cmd_keyword; > + cmdline_fixed_string_t mode; > + uint8_t port_id; > +}; > + > +static void > +cmd_enable_gro_parsed(void *parsed_result, > + __attribute__((unused)) struct cmdline *cl, > + __attribute__((unused)) void *data) > +{ > + struct cmd_gro_result *res; > + > + res = parsed_result; > + setup_gro(res->mode, res->port_id); > +} > + > +cmdline_parse_token_string_t cmd_gro_keyword = > + TOKEN_STRING_INITIALIZER(struct cmd_gro_result, > + cmd_keyword, "gro"); > +cmdline_parse_token_string_t cmd_gro_mode = > + TOKEN_STRING_INITIALIZER(struct cmd_gro_result, > + mode, "on#off"); > +cmdline_parse_token_num_t cmd_gro_pid = > + TOKEN_NUM_INITIALIZER(struct cmd_gro_result, > + port_id, UINT8); > + > +cmdline_parse_inst_t cmd_enable_gro = { > + .f = cmd_enable_gro_parsed, > + .data = NULL, > + .help_str = "gro (on|off) (port_id)", > + .tokens = { > + (void *)&cmd_gro_keyword, > + (void *)&cmd_gro_mode, > + (void *)&cmd_gro_pid, > + NULL, > + }, > +}; > + > +/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO *** > */ > +struct cmd_gro_set_result { > + cmdline_fixed_string_t gro; > + cmdline_fixed_string_t mode; > + uint16_t flow_num; > + uint16_t item_num_per_flow; > + uint8_t port_id; > +}; > + > +static void > +cmd_gro_set_parsed(void *parsed_result, > +__attribute__((unused)) struct cmdline *cl, > +__attribute__((unused)) void *data) > +{ > + struct cmd_gro_set_result *res = parsed_result; > + > + if (port_id_is_invalid(res->port_id, ENABLED_WARN)) > + return; > + if (test_done == 0) { > + printf("Before set GRO flow_num and item_num_per_flow," > + " please stop forwarding first\n"); > + return; &
Re: [dpdk-dev] [PATCH] vhost: fix vhost-user init failed
> -Original Message- > From: Yang, Zhiyong > Sent: Monday, July 10, 2017 4:07 PM > To: dev@dpdk.org > Cc: y...@fridaylinux.org; maxime.coque...@redhat.com; Yao, Lei A > ; Yang, Zhiyong > Subject: [PATCH] vhost: fix vhost-user init failed > > Exception handling is executed in the normal path and it will cause > vhost-user init failure. > Fixes: d6983a70e259("vhost: check return of pthread calls") > > Reported-by: Lei Yao > Signed-off-by: Zhiyong Yang Tested-by: Lei Yao This patch can fix the vhost-init issue on my server. > --- > lib/librte_vhost/socket.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c > index 57b86c0..9d2049c 100644 > --- a/lib/librte_vhost/socket.c > +++ b/lib/librte_vhost/socket.c > @@ -668,7 +668,7 @@ rte_vhost_driver_register(const char *path, uint64_t > flags) > } > > vhost_user.vsockets[vhost_user.vsocket_cnt++] = vsocket; > - > + goto out; > out_mutex: > if (pthread_mutex_destroy(&vsocket->conn_mutex)) { > RTE_LOG(ERR, VHOST_CONFIG, > -- > 2.9.3
Re: [dpdk-dev] [PATCH v3 0/9] virtio/vhost: Add MTU feature support
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime Coquelin > Sent: Monday, March 13, 2017 12:34 AM > To: acon...@redhat.com; so...@sonusnet.com; > yuanhan@linux.intel.com; Tan, Jianfeng ; > thomas.monja...@6wind.com; dev@dpdk.org > Cc: Maxime Coquelin > Subject: [dpdk-dev] [PATCH v3 0/9] virtio/vhost: Add MTU feature support > > This series adds support to new Virtio's MTU feature[1]. The MTU > value is set via QEMU parameters. > > If the feature is negotiated (i.e supported by both host and guest, > and valid MTU value is set in QEMU via its host_mtu parameter), QEMU > shares the configured MTU value throught dedicated Vhost protocol > feature. > > On vhost side, the value is stored in the virtio_net structure, and > made available to the application thanks to new vhost lib's > rte_vhost_get_mtu() function. > > To be able to set eth_dev's MTU value at the right time, i.e. to call > rte_vhost_get_mtu() just after Virtio features have been negotiated > and before the device is really started, a new vhost flag has been > introduced (VIRTIO_DEV_READY), because the VIRTIO_DEV_RUNNING flag is > set too late (after .new_device() ops is called). > > Regarding valid MTU values, the maximum MTU value accepted on vhost > side is 65535 bytes, as defined in Virtio Spec and supported in > Virtio-net Kernel driver. But in Virtio PMD, current maximum frame > size is 9728 bytes (~9700 bytes MTU). So maximum MTU size accepted in > Virtio PMD is the minimum between ~9700 bytes and host's MTU. > > Finally, this series also adds MTU value printing in testpmd's > "show port info" command when non-zero. > > This series target v17.05 release. > > Cheers, > Maxime > > [1]: https://lists.oasis-open.org/archives/virtio-dev/201609/msg00128.html > > Changes since v1: > - > * Rebased on top of v17.02 > * Virtio PMD: ensure MTU value is valid before ack'ing the feature (Aaron) > * Vhost lib/PMD: Remove MTU setting API/op (Yuanhan) > > Changes since v2: > - > * Update release notes (Thomas) > * s/rte_vhost_mtu_get/rte_vhost_get_mtu/ (Yuanhan) > * Use %"PRIu64" instead of %lu (Yuanhan) > * Add rte_vhost_get_mtu in rte_vhost_version.map > > Maxime Coquelin (9): > vhost: Enable VIRTIO_NET_F_MTU feature > vhost: vhost-user: Add MTU protocol feature support > vhost: Add new ready status flag > vhost: Add API to get MTU value > vhost: export MTU value > net/vhost: Fill rte_eth_dev's MTU property > net/virtio: Add MTU feature support > doc: announce Virtio and Vhost MTU support > app/testpmd: print MTU value in show port info > > app/test-pmd/config.c | 5 > doc/guides/nics/features/virtio.ini| 1 + > doc/guides/rel_notes/release_17_05.rst | 8 ++ > drivers/net/vhost/rte_eth_vhost.c | 2 ++ > drivers/net/virtio/virtio_ethdev.c | 45 > -- > drivers/net/virtio/virtio_ethdev.h | 3 ++- > drivers/net/virtio/virtio_pci.h| 3 +++ > lib/librte_vhost/rte_vhost_version.map | 7 ++ > lib/librte_vhost/rte_virtio_net.h | 15 > lib/librte_vhost/vhost.c | 22 - > lib/librte_vhost/vhost.h | 9 ++- > lib/librte_vhost/vhost_user.c | 44 +++--- > --- > lib/librte_vhost/vhost_user.h | 5 +++- > 13 files changed, 156 insertions(+), 13 deletions(-) > > -- > 2.9.3 Hi, Maxime If I want have a try for this MTU function, is there any specific requirement for the settings? Such as the qemu version, kernel version or any others? Looks like this feature are very new in Qemu and linux side. Thanks a lot! BRs Lei
Re: [dpdk-dev] [PATCH v3 0/5] consistent PMD batching behaviour
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ferruh Yigit > Sent: Thursday, March 30, 2017 8:55 PM > To: Yang, Zhiyong ; dev@dpdk.org > Cc: Ananyev, Konstantin ; Richardson, > Bruce > Subject: Re: [dpdk-dev] [PATCH v3 0/5] consistent PMD batching behaviour > > On 3/29/2017 8:16 AM, Zhiyong Yang wrote: > > The rte_eth_tx_burst() function in the file Rte_ethdev.h is invoked to > > transmit output packets on the output queue for DPDK applications as > > follows. > > > > static inline uint16_t > > rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id, > > struct rte_mbuf **tx_pkts, uint16_t nb_pkts); > > > > Note: The fourth parameter nb_pkts: The number of packets to transmit. > > > > The rte_eth_tx_burst() function returns the number of packets it actually > > sent. Most of PMD drivers can support the policy "send as many packets to > > transmit as possible" at the PMD level. but the few of PMDs have some > sort > > of artificial limits for the pkts sent successfully. For example, VHOST tx > > burst size is limited to 32 packets. Some rx_burst functions have the > > similar problem. The main benefit is consistent batching behavior for user > > to simplify their logic and avoid misusage at the application level, there > > is unified rte_eth_tx/rx_burst interface already, there is no reason for > > inconsistent behaviors. > > This patchset fixes it via adding wrapper function at the PMD level. > > > > Changes in V3: > > > > 1. Updated release_17_05 in patch 5/5 > > 2. Rebase on top of next net tree. i40e_rxtx_vec_altivec.c is updated in > > patch 2/5. > > 3. fix one checkpatch issue in 2/5. > > > > Changes in V2: > > 1. rename ixgbe, i40e and fm10k vec function XXX_xmit_pkts_vec to new > name > > XXX_xmit_fixed_burst_vec, new wrapper functions use original name > > XXX_xmit_pkts_vec according to Bruce's suggestion. > > 2. simplify the code to avoid the if or if/else. > > > > Zhiyong Yang (5): > > net/fm10k: remove limit of fm10k_xmit_pkts_vec burst size > > net/i40e: remove limit of i40e_xmit_pkts_vec burst size > > net/ixgbe: remove limit of ixgbe_xmit_pkts_vec burst size > > net/vhost: remove limit of vhost TX burst size > > net/vhost: remove limit of vhost RX burst size > > Series applied to dpdk-next-net/master, thanks. > > (doc patch exported into separate patch) > > This is the PMD update on fast path, effected PMDs, can you please > confirm the performance after test? Hi, I have compare the vhost PVP performance with and without Zhiyong's Patch. Almost no performance drop Mergeable path: -0.2% Normal Path: -0.73% Vector Path : -0.55% Test bench: Ubutnu16.04 Kernal: 4.4.0 gcc : 5.4.0 BRs Lei
Re: [dpdk-dev] [RFC PATCH] net/virtio: Align Virtio-net header on cache line in receive path
> -Original Message- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Monday, March 6, 2017 10:11 PM > To: Yuanhan Liu > Cc: Liang, Cunming ; Tan, Jianfeng > ; dev@dpdk.org; Wang, Zhihong > ; Yao, Lei A > Subject: Re: [RFC PATCH] net/virtio: Align Virtio-net header on cache line in > receive path > > > > On 03/06/2017 09:46 AM, Yuanhan Liu wrote: > > On Wed, Mar 01, 2017 at 08:36:24AM +0100, Maxime Coquelin wrote: > >> > >> > >> On 02/23/2017 06:49 AM, Yuanhan Liu wrote: > >>> On Wed, Feb 22, 2017 at 10:36:36AM +0100, Maxime Coquelin wrote: > >>>> > >>>> > >>>> On 02/22/2017 02:37 AM, Yuanhan Liu wrote: > >>>>> On Tue, Feb 21, 2017 at 06:32:43PM +0100, Maxime Coquelin wrote: > >>>>>> This patch aligns the Virtio-net header on a cache-line boundary to > >>>>>> optimize cache utilization, as it puts the Virtio-net header (which > >>>>>> is always accessed) on the same cache line as the packet header. > >>>>>> > >>>>>> For example with an application that forwards packets at L2 level, > >>>>>> a single cache-line will be accessed with this patch, instead of > >>>>>> two before. > >>>>> > >>>>> I'm assuming you were testing pkt size <= (64 - hdr_size)? > >>>> > >>>> No, I tested with 64 bytes packets only. > >>> > >>> Oh, my bad, I overlooked it. While you were saying "a single cache > >>> line", I was thinking putting the virtio net hdr and the "whole" > >>> packet data in single cache line, which is not possible for pkt > >>> size 64B. > >>> > >>>> I run some more tests this morning with different packet sizes, > >>>> and also with changing the mbuf size on guest side to have multi- > >>>> buffers packets: > >>>> > >>>> +---+++-+ > >>>> | Txpkt | Rxmbuf | v17.02 | v17.02 + vnet hdr align | > >>>> +---+++-+ > >>>> |64 | 2048 | 11.05 | 11.78 | > >>>> | 128 | 2048 | 10.66 | 11.48 | > >>>> | 256 | 2048 | 10.47 | 11.21 | > >>>> | 512 | 2048 | 10.22 | 10.88 | > >>>> | 1024 | 2048 | 7.65 |7.84 | > >>>> | 1500 | 2048 | 6.25 |6.45 | > >>>> | 2000 | 2048 | 5.31 |5.43 | > >>>> | 2048 | 2048 | 5.32 |4.25 | > >>>> | 1500 |512 | 3.89 |3.98 | > >>>> | 2048 |512 | 1.96 |2.02 | > >>>> +---+++-+ > >>> > >>> Could you share more info, say is it a PVP test? Is mergeable on? > >>> What's the fwd mode? > >> > >> No, this is not PVP benchmark, I have neither another server nor a packet > >> generator connected to my Haswell machine back-to-back. > >> > >> This is simple micro-benchmark, vhost PMD in txonly, Virtio PMD in > >> rxonly. In this configuration, mergeable is ON and no offload disabled > >> in QEMU cmdline. > > > > Okay, I see. So the boost, as you have stated, comes from saving two > > cache line access to one. Before that, vhost write 2 cache lines, > > while the virtio pmd reads 2 cache lines: one for reading the header, > > another one for reading the ether header, for updating xstats (there > > is no ether access in the fwd mode you tested). > > > >> That's why I would be interested in more testing on recent hardware > >> with PVP benchmark. Is it something that could be run in Intel lab? > > > > I think Yao Lei could help on that? But as stated, I think it may > > break the performance for bit packets. And I also won't expect big > > boost even for 64B in PVP test, judging that it's only 6% boost in > > micro bechmarking. > That would be great. > Note that on SandyBridge, on which I see a drop in perf with > microbenchmark, I get a 4% gain on PVP benchmark. So on recent hardware > that show a gain on microbenchmark, I'm curious of the gain with PVP > bench. > Hi, Maxime, Yuanhan I have execute the PVP and loopback performance test on my Ivy bridge server. OS:
Re: [dpdk-dev] [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO
> -Original Message- > From: Hu, Jiayu > Sent: Sunday, June 18, 2017 3:21 PM > To: dev@dpdk.org > Cc: Ananyev, Konstantin ; > y...@fridaylinux.org; Wiles, Keith ; Tan, Jianfeng > ; Bie, Tiwei ; Yao, Lei A > ; Hu, Jiayu > Subject: [PATCH v5 3/3] app/testpmd: enable TCP/IPv4 GRO > > This patch demonstrates the usage of GRO library in testpmd. By default, > GRO is turned off. Command, "gro on (port_id)", turns on GRO for the > given port; command, "gro off (port_id)", turns off GRO for the given > port. Currently, GRO only supports to process TCP/IPv4 packets and works > in IO forward mode. Besides, only GRO lightweight mode is enabled. > > Signed-off-by: Jiayu Hu Tested-by: Lei Yao This patch set has been tested on my bench using iperf. In some scenario, the performance gain can reach about 50%. The performance gain will depend on the rx burst size in real usage. OS: Ubuntu 16.04 CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz > --- > app/test-pmd/cmdline.c | 45 > + > app/test-pmd/config.c | 29 + > app/test-pmd/iofwd.c | 6 ++ > app/test-pmd/testpmd.c | 3 +++ > app/test-pmd/testpmd.h | 11 +++ > 5 files changed, 94 insertions(+) > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c > index 105c71f..d1ca8df 100644 > --- a/app/test-pmd/cmdline.c > +++ b/app/test-pmd/cmdline.c > @@ -76,6 +76,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -423,6 +424,9 @@ static void cmd_help_long_parsed(void > *parsed_result, > "tso show (portid)" > "Display the status of TCP Segmentation > Offload.\n\n" > > + "gro (on|off) (port_id)" > + "Enable or disable Generic Receive Offload.\n\n" > + > "set fwd (%s)\n" > "Set packet forwarding mode.\n\n" > > @@ -3831,6 +3835,46 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = { > }, > }; > > +/* *** SET GRO FOR A PORT *** */ > +struct cmd_gro_result { > + cmdline_fixed_string_t cmd_keyword; > + cmdline_fixed_string_t mode; > + uint8_t port_id; > +}; > + > +static void > +cmd_set_gro_parsed(void *parsed_result, > + __attribute__((unused)) struct cmdline *cl, > + __attribute__((unused)) void *data) > +{ > + struct cmd_gro_result *res; > + > + res = parsed_result; > + setup_gro(res->mode, res->port_id); > +} > + > +cmdline_parse_token_string_t cmd_gro_keyword = > + TOKEN_STRING_INITIALIZER(struct cmd_gro_result, > + cmd_keyword, "gro"); > +cmdline_parse_token_string_t cmd_gro_mode = > + TOKEN_STRING_INITIALIZER(struct cmd_gro_result, > + mode, "on#off"); > +cmdline_parse_token_num_t cmd_gro_pid = > + TOKEN_NUM_INITIALIZER(struct cmd_gro_result, > + port_id, UINT8); > + > +cmdline_parse_inst_t cmd_set_gro = { > + .f = cmd_set_gro_parsed, > + .data = NULL, > + .help_str = "gro (on|off) (port_id)", > + .tokens = { > + (void *)&cmd_gro_keyword, > + (void *)&cmd_gro_mode, > + (void *)&cmd_gro_pid, > + NULL, > + }, > +}; > + > /* *** ENABLE/DISABLE FLUSH ON RX STREAMS *** */ > struct cmd_set_flush_rx { > cmdline_fixed_string_t set; > @@ -13710,6 +13754,7 @@ cmdline_parse_ctx_t main_ctx[] = { > (cmdline_parse_inst_t *)&cmd_tso_show, > (cmdline_parse_inst_t *)&cmd_tunnel_tso_set, > (cmdline_parse_inst_t *)&cmd_tunnel_tso_show, > + (cmdline_parse_inst_t *)&cmd_set_gro, > (cmdline_parse_inst_t *)&cmd_link_flow_control_set, > (cmdline_parse_inst_t *)&cmd_link_flow_control_set_rx, > (cmdline_parse_inst_t *)&cmd_link_flow_control_set_tx, > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c > index 3cd4f31..858342d 100644 > --- a/app/test-pmd/config.c > +++ b/app/test-pmd/config.c > @@ -71,6 +71,7 @@ > #ifdef RTE_LIBRTE_BNXT_PMD > #include > #endif > +#include > > #include "testpmd.h" > > @@ -2414,6 +2415,34 @@ set_tx_pkt_segments(unsigned *seg_lengths, > unsigned nb_segs) > tx_pkt_nb_segs = (uint8_t) nb_segs; > } > > +void > +setup_gro(const char *mode, uint8_t port_id) > +{ > + if (!rte_eth_dev_is_valid_port(port_id)) { > + printf("invalid port id %u\n", port_id); >
Re: [dpdk-dev] [PATCH v6 3/3] app/testpmd: enable TCP/IPv4 GRO
> -Original Message- > From: Hu, Jiayu > Sent: Friday, June 23, 2017 10:43 PM > To: dev@dpdk.org > Cc: Ananyev, Konstantin ; Tan, Jianfeng > ; step...@networkplumber.org; > y...@fridaylinux.org; Wiles, Keith ; Bie, Tiwei > ; Yao, Lei A ; Hu, Jiayu > > Subject: [PATCH v6 3/3] app/testpmd: enable TCP/IPv4 GRO > > This patch enables TCP/IPv4 GRO library in csum forwarding engine. > By default, GRO is turned off. Users can use command "gro (on|off) > (port_id)" to enable or disable GRO for a given port. If a port is > enabled GRO, all TCP/IPv4 packets received from the port are performed > GRO. Besides, users can set max flow number and packets number per-flow > by command "gro set (max_flow_num) (max_item_num_per_flow) > (port_id)". > > Signed-off-by: Jiayu Hu Tested-By: Lei Yao This patch is tested on the following test bench: OS: Ubuntu 16.04 CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz NIC: XXV710 25G We can see the iperf result improve a lot after enable GRO. The data flow is NIC1->NIC2->testpmd(GRO on/off)->vhost->virtio-net(in VM) > --- > app/test-pmd/cmdline.c | 125 > > app/test-pmd/config.c | 37 > app/test-pmd/csumonly.c | 5 ++ > app/test-pmd/testpmd.c | 3 + > app/test-pmd/testpmd.h | 11 +++ > doc/guides/testpmd_app_ug/testpmd_funcs.rst | 34 > 6 files changed, 215 insertions(+) > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c > index ff8ffd2..cb359e1 100644 > --- a/app/test-pmd/cmdline.c > +++ b/app/test-pmd/cmdline.c > @@ -76,6 +76,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -419,6 +420,14 @@ static void cmd_help_long_parsed(void > *parsed_result, > "tso show (portid)" > "Display the status of TCP Segmentation > Offload.\n\n" > > + "gro (on|off) (port_id)" > + "Enable or disable Generic Receive Offload in io" > + " forward engine.\n\n" > + > + "gro set (max_flow_num) > (max_item_num_per_flow) (port_id)\n" > + "Set max flow number and max packet number > per-flow" > + " for GRO.\n\n" > + > "set fwd (%s)\n" > "Set packet forwarding mode.\n\n" > > @@ -3827,6 +3836,120 @@ cmdline_parse_inst_t cmd_tunnel_tso_show = { > }, > }; > > +/* *** SET GRO FOR A PORT *** */ > +struct cmd_gro_result { > + cmdline_fixed_string_t cmd_keyword; > + cmdline_fixed_string_t mode; > + uint8_t port_id; > +}; > + > +static void > +cmd_enable_gro_parsed(void *parsed_result, > + __attribute__((unused)) struct cmdline *cl, > + __attribute__((unused)) void *data) > +{ > + struct cmd_gro_result *res; > + > + res = parsed_result; > + setup_gro(res->mode, res->port_id); > +} > + > +cmdline_parse_token_string_t cmd_gro_keyword = > + TOKEN_STRING_INITIALIZER(struct cmd_gro_result, > + cmd_keyword, "gro"); > +cmdline_parse_token_string_t cmd_gro_mode = > + TOKEN_STRING_INITIALIZER(struct cmd_gro_result, > + mode, "on#off"); > +cmdline_parse_token_num_t cmd_gro_pid = > + TOKEN_NUM_INITIALIZER(struct cmd_gro_result, > + port_id, UINT8); > + > +cmdline_parse_inst_t cmd_enable_gro = { > + .f = cmd_enable_gro_parsed, > + .data = NULL, > + .help_str = "gro (on|off) (port_id)", > + .tokens = { > + (void *)&cmd_gro_keyword, > + (void *)&cmd_gro_mode, > + (void *)&cmd_gro_pid, > + NULL, > + }, > +}; > + > +/* *** SET MAX FLOW NUMBER AND ITEM NUM PER FLOW FOR GRO *** > */ > +struct cmd_gro_set_result { > + cmdline_fixed_string_t gro; > + cmdline_fixed_string_t mode; > + uint16_t flow_num; > + uint16_t item_num_per_flow; > + uint8_t port_id; > +}; > + > +static void > +cmd_gro_set_parsed(void *parsed_result, > +__attribute__((unused)) struct cmdline *cl, > +__attribute__((unused)) void *data) > +{ > + struct cmd_gro_set_result *res = parsed_result; > + > + if (port_id_is_invalid(res->port_id, ENABLED_WARN)) > + return; > + if (test_done == 0) { > +
Re: [dpdk-dev] [PATCH] examples/l3fwd-power: fix Rx descriptor size
I have test this patch based on 17.05-rc2 , issue is fixed. Host frequency can be changed according to the data throughput. Tested-by: Lei Yao (lei.a@intel.com) > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Pablo de Lara > Sent: Wednesday, April 26, 2017 7:30 PM > To: dev@dpdk.org > Cc: De Lara Guarch, Pablo ; > sta...@dpdk.org > Subject: [dpdk-dev] [PATCH] examples/l3fwd-power: fix Rx descriptor size > > L3fwd power app monitors the RX queues to see if the polling frequency > should be adjusted (the busier the queue, the higher the frequency). > The app uses several thresholds in the ring to determine the frequency, > being 96 the highest one, when frequency should be highest. > > The problem is that the difference between this value and the ring size > is not big enough (128 - 96 = 32 descriptors), which means that > if the descriptors are not replenished quick enough, queue might > not be busy, but the app would think that it is, because 96th descriptor > is set. > > Therefore, by increasing this gap (increasing the RX ring size), > we make sure that this false measurement will not happen. > > Fixes: b451aa39db31 ("examples/l3fwd-power: use DD bit rather than RX > queue count") > Cc: sta...@dpdk.org > > Signed-off-by: Pablo de Lara > --- > examples/l3fwd-power/main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c > index ec40a17..7a8e1cd 100644 > --- a/examples/l3fwd-power/main.c > +++ b/examples/l3fwd-power/main.c > @@ -147,7 +147,7 @@ > /* > * Configurable number of RX/TX ring descriptors > */ > -#define RTE_TEST_RX_DESC_DEFAULT 128 > +#define RTE_TEST_RX_DESC_DEFAULT 512 > #define RTE_TEST_TX_DESC_DEFAULT 512 > static uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT; > static uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT; > -- > 2.7.4
Re: [dpdk-dev] [PATCH] libs/power: fix the resource leaking issue
> -Original Message- > From: Ma, Liang J > Sent: Friday, December 28, 2018 7:33 PM > To: Hunt, David > Cc: dev@dpdk.org; Burakov, Anatoly ; Yao, Lei > A ; Ma, Liang J > Subject: [PATCH] libs/power: fix the resource leaking issue > > Fixes: e6c6dc0f96c8 ("power: add p-state driver compatibility") > Coverity issue: 328528 > > Also add the missing functionality of enable/disable turbo > > Signed-off-by: Liang Ma Reviewed-by: Lei Yao Tested-by: Lei Yao This patch has been tested based on 19.02-rc1 code. > --- > lib/librte_power/power_pstate_cpufreq.c | 34 > - > 1 file changed, 33 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_power/power_pstate_cpufreq.c > b/lib/librte_power/power_pstate_cpufreq.c > index 411d0eb..cb226a5 100644 > --- a/lib/librte_power/power_pstate_cpufreq.c > +++ b/lib/librte_power/power_pstate_cpufreq.c > @@ -160,6 +160,10 @@ power_init_for_setting_freq(struct > pstate_power_info *pi) > pi->lcore_id); > > f_max = fopen(fullpath_max, "rw+"); > + > + if (f_max == NULL) > + fclose(f_min); > + > FOPEN_OR_ERR_RET(f_max, -1); > > pi->f_cur_min = f_min; > @@ -214,7 +218,13 @@ set_freq_internal(struct pstate_power_info *pi, > uint32_t idx) > /* Turbo is available and enabled, first freq bucket is sys max freq */ > if (pi->turbo_available && pi->turbo_enable && (idx == 0)) > target_freq = pi->sys_max_freq; > - else > + else if (pi->turbo_available && (!pi->turbo_enable) && (idx == 0)) { > + > + RTE_LOG(ERR, POWER, "Turbo is off, frequency can't be > scaled up more %u\n", > + pi->lcore_id); > + return -1; > + > + } else > target_freq = pi->freqs[idx]; > > /* Decrease freq, the min freq should be updated first */ > @@ -394,6 +404,10 @@ power_get_available_freqs(struct > pstate_power_info *pi) > FOPEN_OR_ERR_RET(f_min, ret); > > f_max = fopen(fullpath_max, "r"); > + > + if (f_max == NULL) > + fclose(f_min); > + > FOPEN_OR_ERR_RET(f_max, ret); > > s_min = fgets(buf_min, sizeof(buf_min), f_min); > @@ -726,6 +740,14 @@ power_pstate_enable_turbo(unsigned int lcore_id) > return -1; > } > > + /* Max may have changed, so call to max function */ > + if (power_pstate_cpufreq_freq_max(lcore_id) < 0) { > + RTE_LOG(ERR, POWER, > + "Failed to set frequency of lcore %u to max\n", > + lcore_id); > + return -1; > + } > + > return 0; > } > > @@ -744,6 +766,16 @@ power_pstate_disable_turbo(unsigned int lcore_id) > > pi->turbo_enable = 0; > > + if ((pi->turbo_available) && (pi->curr_idx <= 1)) { > + /* Try to set freq to max by default coming out of turbo */ > + if (power_pstate_cpufreq_freq_max(lcore_id) < 0) { > + RTE_LOG(ERR, POWER, > + "Failed to set frequency of lcore %u to > max\n", > + lcore_id); > + return -1; > + } > + } > + > > return 0; > } > -- > 2.7.5
Re: [dpdk-dev] [PATCH] examples/power: fix wrong core_id with JSON cmds
> -Original Message- > From: Hunt, David > Sent: Monday, January 7, 2019 7:39 PM > To: dev@dpdk.org > Cc: Hunt, David ; Yao, Lei A > Subject: [PATCH] examples/power: fix wrong core_id with JSON cmds > > This patch fixes a bug introduced in the 64-core limtation > enhancement where the core_id is inadvertently converted from > virtual to physical even though it may already be a physical > core_id. > > We should be using the core_type field, and only converting via > hypervisor when core_type is set to CORE_TYPE_VIRTUAL > > Fixes: 5776b7a371d1 ("examples/power: allow VM to use lcores over 63") > > Signed-off-by: David Hunt > --- > examples/vm_power_manager/channel_monitor.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/examples/vm_power_manager/channel_monitor.c > b/examples/vm_power_manager/channel_monitor.c > index 85622e7cb..1a3a0fa76 100644 > --- a/examples/vm_power_manager/channel_monitor.c > +++ b/examples/vm_power_manager/channel_monitor.c > @@ -640,7 +640,10 @@ process_request(struct channel_packet *pkt, struct > channel_info *chan_info) > if (pkt->command == CPU_POWER) { > unsigned int core_num; > > - core_num = get_pcpu(chan_info, pkt->resource_id); > + if (pkt->core_type == CORE_TYPE_VIRTUAL) > + core_num = get_pcpu(chan_info, pkt->resource_id); Hi, Dave Now in DPDK code, only command send from VM(guest_cli sample) will set the pkt- + else > + core_num = pkt->resource_id; > > switch (pkt->unit) { > case(CPU_POWER_SCALE_MIN): > -- > 2.17.1
Re: [dpdk-dev] [PATCH] examples/power: fix wrong core_id with JSON cmds
> -Original Message- > From: Hunt, David > Sent: Tuesday, January 8, 2019 5:20 PM > To: Yao, Lei A ; dev@dpdk.org > Subject: Re: [PATCH] examples/power: fix wrong core_id with JSON cmds > > Hi Lei, > > On 8/1/2019 2:02 AM, Yao, Lei A wrote: > > > >> -Original Message- > >> From: Hunt, David > >> Sent: Monday, January 7, 2019 7:39 PM > >> To: dev@dpdk.org > >> Cc: Hunt, David ; Yao, Lei A > >> Subject: [PATCH] examples/power: fix wrong core_id with JSON cmds > >> > >> This patch fixes a bug introduced in the 64-core limtation > >> enhancement where the core_id is inadvertently converted from > >> virtual to physical even though it may already be a physical > >> core_id. > >> > >> We should be using the core_type field, and only converting via > >> hypervisor when core_type is set to CORE_TYPE_VIRTUAL > >> > >> Fixes: 5776b7a371d1 ("examples/power: allow VM to use lcores over 63") > >> > >> Signed-off-by: David Hunt Reviewed-by: Lei Yao Tested-by: Lei Yao > >> --- > >> examples/vm_power_manager/channel_monitor.c | 5 - > >> 1 file changed, 4 insertions(+), 1 deletion(-) > >> > >> diff --git a/examples/vm_power_manager/channel_monitor.c > >> b/examples/vm_power_manager/channel_monitor.c > >> index 85622e7cb..1a3a0fa76 100644 > >> --- a/examples/vm_power_manager/channel_monitor.c > >> +++ b/examples/vm_power_manager/channel_monitor.c > >> @@ -640,7 +640,10 @@ process_request(struct channel_packet *pkt, > struct > >> channel_info *chan_info) > >>if (pkt->command == CPU_POWER) { > >>unsigned int core_num; > >> > >> - core_num = get_pcpu(chan_info, pkt->resource_id); > >> + if (pkt->core_type == CORE_TYPE_VIRTUAL) > >> + core_num = get_pcpu(chan_info, pkt->resource_id); > > Hi, Dave > > > > Now in DPDK code, only command send from VM(guest_cli sample) will set > the > > pkt- seems > > we always won't hit this branch. Because parse_json_to_pkt() will always > set the > > core_type to CORE_TYPE_PHYSICAL. > > If resource_id in instruction format JSON file will always be treated as > > Pcpu, > it's the > > same as core_list behavior. > > > Yes, that's correct. But I believe it's OK for the moment. > > Currently the only way for the guest app to send commands and policies > to the host is via the virtio-serial interface, which takes a different > code path. There is no > way currently for the guest app to send a JSON string to the host, so > that code path will > never be hit. However, now that the JSON functionality is in the host, > we plan to add > that same functionality to the guest, and when we do that we will ensure > that core_type > is handled appropriately. > > Regards, > Dave. > > >> + else > >> + core_num = pkt->resource_id; > >> > >>switch (pkt->unit) { > >>case(CPU_POWER_SCALE_MIN): > >> -- > >> 2.17.1 > > > >
Re: [dpdk-dev] [PATCH v5 07/10] examples/power: add json string handling
> +#ifdef USE_JANSSON > +static int > +parse_json_to_pkt(json_t *element, struct channel_packet *pkt) > +{ > + const char *key; > + json_t *value; > + int ret; > + > + memset(pkt, 0, sizeof(struct channel_packet)); > + > + pkt->nb_mac_to_monitor = 0; > + pkt->t_boost_status.tbEnabled = false; > + pkt->workload = LOW; > + pkt->policy_to_use = TIME; > + pkt->command = PKT_POLICY; > + pkt->core_type = CORE_TYPE_PHYSICAL; > + Hi, Dave For the workload policy , it's set to LOW by default, but we can't change it again using JSON file channel. Is it by design?
Re: [dpdk-dev] [PATCH v10 1/4] lib/librte_power: traffic pattern aware power control
+ + if (get_freq_index(LOW) > total_avail_freqs[i]) + return -1; + + if (rte_get_master_lcore() != i) { + w->wrk_stats[i].lcore_id = i; + set_policy(&w->wrk_stats[i], policy); + } + } + + return 0; +} Hi, Liang There is one issue in this part. When you find one frequency level can't be support on the server we used, you return directly. This will skip the set_policy step in the following. If skip the set_policy step, the behavior will be the power lib always execute the training steps, even we set the policy.state=MED_NORMAL in the sample. This will confuse the user, they don’t know why they can't skip the training steps even the sample is already configured to --empty-poll=0,x,xx BRs Lei
Re: [dpdk-dev] [PATCH v10 1/4] lib/librte_power: traffic pattern aware power control
> -Original Message- > From: Ma, Liang J > Sent: Friday, October 12, 2018 6:03 PM > To: Yao, Lei A > Cc: Hunt, David ; dev@dpdk.org; > ktray...@redhat.com; Kovacevic, Marko > Subject: Re: [PATCH v10 1/4] lib/librte_power: traffic pattern aware power > control > > On 11 Oct 18:59, Yao, Lei A wrote: > > > > > > + > > + if (get_freq_index(LOW) > total_avail_freqs[i]) > > + return -1; > > + > > + if (rte_get_master_lcore() != i) { > > + w->wrk_stats[i].lcore_id = i; > > + set_policy(&w->wrk_stats[i], policy); > > + } > > + } > > + > > + return 0; > > +} > > > > Hi, Liang > > > > There is one issue in this part. > > When you find one frequency level can't be support on the server > > we used, you return directly. This will skip the set_policy step in the > following. > > If skip the set_policy step, the behavior will be the power lib always > > execute the training steps, even we set the policy.state=MED_NORMAL in > the sample. > > This will confuse the user, they don’t know why they can't skip the training > steps even > > the sample is already configured to --empty-poll=0,x,xx > > > > BRs > > Lei > Hi Lei, >I think the lib code logic is OK. >if the LOW freq index still is bigger than highest avaiable freq index, > sth is > wrong. >the execution should stop. >Simple app should check the rte_power_empty_poll_stat_init >result, if rte_power_empty_poll_stat_init return error. the sample app > should exit. >I can update the sample app code add the checking. > Regards > Liang Hi, Liang If sample will exit in this situation, it's OK for me. Thanks. BRs Lei
Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Hi, Lucero, Thomas This patch set will cause deadlock during memory initialization. rte_memseg_walk and try_expand_heap both will lock the file &mcfg->memory_hotplug_lock. So dead lock will occur. #0 rte_memseg_walk #1 <-rte_eal_check_dma_mask #2 <-alloc_pages_on_heap #3 <-try_expand_heap_primary #4 <-try_expand_heap Log as following: EAL: TSC frequency is ~2494156 KHz EAL: Master lcore 0 is ready (tid=77fe3c00;cpuset=[0]) [New Thread 0x75e0d700 (LWP 330350)] EAL: lcore 1 is ready (tid=75e0d700;cpuset=[1]) EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 Could you have a check on this? A lot of test cases in our validation team fail because of this. Thanks a lot! BRs Lei > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon > Sent: Monday, October 29, 2018 5:04 AM > To: Alejandro Lucero > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA > mask > > 05/10/2018 14:45, Alejandro Lucero: > > I sent a patchset about this to be applied on 17.11 stable. The memory > > code has had main changes since that version, so here it is the patchset > > adjusted to current master repo. > > > > This patchset adds, mainly, a check for ensuring IOVAs are within a > > restricted range due to addressing limitations with some devices. There > > are two known cases: NFP and IOMMU VT-d emulation. > > > > With this check IOVAs out of range are detected and PMDs can abort > > initialization. For the VT-d case, IOVA VA mode is allowed as long as > > IOVAs are within the supported range, avoiding to forbid IOVA VA by > > default. > > > > For the addressing limitations known cases, there are just 40(NFP) or > > 39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations > > imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for > > most systems. With machines using more memory, the added check will > > ensure IOVAs within the range. > > > > With IOVA VA, and because the way the Linux kernel serves mmap calls > > in 64 bits systems, 39 or 40 bits are not enough. It is possible to > > give an address hint with a lower starting address than the default one > > used by the kernel, and then ensuring the mmap uses that hint or hint plus > > some offset. With 64 bits systems, the process virtual address space is > > large enoguh for doing the hugepages mmaping within the supported > range > > when those addressing limitations exist. This patchset also adds a change > > for using such a hint making the use of IOVA VA a more than likely > > possibility when there are those addressing limitations. > > > > The check is not done by default but just when it is required. This > > patchset adds the check for NFP initialization and for setting the IOVA > > mode is an emulated VT-d is detected. Also, because the recent patchset > > adding dynamic memory allocation, the check is also invoked for ensuring > > the new memsegs are within the required range. > > > > This patchset could be applied to stable 18.05. > > Applied, thanks > > >
Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
> -Original Message- > From: Thomas Monjalon [mailto:tho...@monjalon.net] > Sent: Monday, October 29, 2018 4:43 PM > To: Yao, Lei A > Cc: Alejandro Lucero ; dev@dpdk.org; > Xu, Qian Q ; Lin, Xueqin ; > Burakov, Anatoly > Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA > mask > > 29/10/2018 09:23, Yao, Lei A: > > Hi, Lucero, Thomas > > > > This patch set will cause deadlock during memory initialization. > > rte_memseg_walk and try_expand_heap both will lock > > the file &mcfg->memory_hotplug_lock. So dead lock will occur. > > > > #0 rte_memseg_walk > > #1 <-rte_eal_check_dma_mask > > #2 <-alloc_pages_on_heap > > #3 <-try_expand_heap_primary > > #4 <-try_expand_heap > > > > Log as following: > > EAL: TSC frequency is ~2494156 KHz > > EAL: Master lcore 0 is ready (tid=77fe3c00;cpuset=[0]) > > [New Thread 0x75e0d700 (LWP 330350)] > > EAL: lcore 1 is ready (tid=75e0d700;cpuset=[1]) > > EAL: Trying to obtain current memory policy. > > EAL: Setting policy MPOL_PREFERRED for socket 0 > > EAL: Restoring previous memory policy: 0 > > > > Could you have a check on this? A lot of test cases in our validation > > team fail because of this. Thanks a lot! > > Can we just call rte_memseg_walk_thread_unsafe()? > > +Cc Anatoly Hi, Thomas I change to rte_memseg_walk_thread_unsafe(), still Can't work. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: memseg iova 14000, len 4000, out of range EAL:using dma mask EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 1 EAL: Restoring previous memory policy: 0 EAL: memseg iova 1bc000, len 4000, out of range EAL:using dma mask ffff EAL: alloc_pages_on_heap(): couldn't allocate memory due to DMA mask error allocating rte services array EAL: FATAL: rte_service_init() failed EAL: rte_service_init() failed PANIC in main(): BRs Lei > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon > > > 05/10/2018 14:45, Alejandro Lucero: > > > > I sent a patchset about this to be applied on 17.11 stable. The memory > > > > code has had main changes since that version, so here it is the patchset > > > > adjusted to current master repo. > > > > > > > > This patchset adds, mainly, a check for ensuring IOVAs are within a > > > > restricted range due to addressing limitations with some devices. There > > > > are two known cases: NFP and IOMMU VT-d emulation. > > > > > > > > With this check IOVAs out of range are detected and PMDs can abort > > > > initialization. For the VT-d case, IOVA VA mode is allowed as long as > > > > IOVAs are within the supported range, avoiding to forbid IOVA VA by > > > > default. > > > > > > > > For the addressing limitations known cases, there are just 40(NFP) or > > > > 39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations > > > > imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for > > > > most systems. With machines using more memory, the added check > will > > > > ensure IOVAs within the range. > > > > > > > > With IOVA VA, and because the way the Linux kernel serves mmap calls > > > > in 64 bits systems, 39 or 40 bits are not enough. It is possible to > > > > give an address hint with a lower starting address than the default one > > > > used by the kernel, and then ensuring the mmap uses that hint or hint > plus > > > > some offset. With 64 bits systems, the process virtual address space is > > > > large enoguh for doing the hugepages mmaping within the supported > > > range > > > > when those addressing limitations exist. This patchset also adds a > change > > > > for using such a hint making the use of IOVA VA a more than likely > > > > possibility when there are those addressing limitations. > > > > > > > > The check is not done by default but just when it is required. This > > > > patchset adds the check for NFP initialization and for setting the IOVA > > > > mode is an emulated VT-d is detected. Also, because the recent > patchset > > > > adding dynamic memory allocation, the check is also invoked for > ensuring > > > > the new memsegs are within the required range. > > > > > > > > This patchset could be applied to stable 18.05. > > > > > > Applied, thanks > >
Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
Hi, Lucero My server info: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz Hugepage: 1G Kernel: 4.15.0 OS: Ubuntu Steps are simple: 1. Bind one i40e/ixgbe NIC to igb_uio 2. Launch testpmd: ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x03 -n 4 --log-level=eal,8 -- -i BRs Lei From: Alejandro Lucero [mailto:alejandro.luc...@netronome.com] Sent: Monday, October 29, 2018 5:26 PM To: Thomas Monjalon Cc: Yao, Lei A ; dev ; Xu, Qian Q ; Lin, Xueqin ; Burakov, Anatoly ; Yigit, Ferruh ; Richardson, Bruce Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Can we have the configuration triggering this issue? On Mon, Oct 29, 2018 at 9:07 AM Thomas Monjalon mailto:tho...@monjalon.net>> wrote: One more comment about this issue, There was no reply to the question asked by Alejandro on October 11th: http://mails.dpdk.org/archives/dev/2018-October/115402.html and there were no more reviews despite all my requests: http://mails.dpdk.org/archives/dev/2018-October/117475.html Without any more comment, I had to apply the patchset. Now we need to find a solution. Please suggest. 29/10/2018 09:42, Thomas Monjalon: > 29/10/2018 09:23, Yao, Lei A: > > Hi, Lucero, Thomas > > > > This patch set will cause deadlock during memory initialization. > > rte_memseg_walk and try_expand_heap both will lock > > the file &mcfg->memory_hotplug_lock. So dead lock will occur. > > > > #0 rte_memseg_walk > > #1 <-rte_eal_check_dma_mask > > #2 <-alloc_pages_on_heap > > #3 <-try_expand_heap_primary > > #4 <-try_expand_heap > > > > Log as following: > > EAL: TSC frequency is ~2494156 KHz > > EAL: Master lcore 0 is ready (tid=77fe3c00;cpuset=[0]) > > [New Thread 0x75e0d700 (LWP 330350)] > > EAL: lcore 1 is ready (tid=75e0d700;cpuset=[1]) > > EAL: Trying to obtain current memory policy. > > EAL: Setting policy MPOL_PREFERRED for socket 0 > > EAL: Restoring previous memory policy: 0 > > > > Could you have a check on this? A lot of test cases in our validation > > team fail because of this. Thanks a lot! > > Can we just call rte_memseg_walk_thread_unsafe()? > > +Cc Anatoly > > > > From: dev [mailto:dev-boun...@dpdk.org<mailto:dev-boun...@dpdk.org>] On > > Behalf Of Thomas Monjalon > > > 05/10/2018 14:45, Alejandro Lucero: > > > > I sent a patchset about this to be applied on 17.11 stable. The memory > > > > code has had main changes since that version, so here it is the patchset > > > > adjusted to current master repo. > > > > > > > > This patchset adds, mainly, a check for ensuring IOVAs are within a > > > > restricted range due to addressing limitations with some devices. There > > > > are two known cases: NFP and IOMMU VT-d emulation. > > > > > > > > With this check IOVAs out of range are detected and PMDs can abort > > > > initialization. For the VT-d case, IOVA VA mode is allowed as long as > > > > IOVAs are within the supported range, avoiding to forbid IOVA VA by > > > > default. > > > > > > > > For the addressing limitations known cases, there are just 40(NFP) or > > > > 39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations > > > > imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for > > > > most systems. With machines using more memory, the added check will > > > > ensure IOVAs within the range. > > > > > > > > With IOVA VA, and because the way the Linux kernel serves mmap calls > > > > in 64 bits systems, 39 or 40 bits are not enough. It is possible to > > > > give an address hint with a lower starting address than the default one > > > > used by the kernel, and then ensuring the mmap uses that hint or hint > > > > plus > > > > some offset. With 64 bits systems, the process virtual address space is > > > > large enoguh for doing the hugepages mmaping within the supported > > > range > > > > when those addressing limitations exist. This patchset also adds a > > > > change > > > > for using such a hint making the use of IOVA VA a more than likely > > > > possibility when there are those addressing limitations. > > > > > > > > The check is not done by default but just when it is required. This > > > > patchset adds the check for NFP initialization and for setting the IOVA > > > > mode is an emulated VT-d is detected. Also, because the recent patchset > > > > adding dynamic memory allocation, the check is also invoked for ensuring > > > > the new memsegs are within the required range. > > > > > > > > This patchset could be applied to stable 18.05. > > > > > > Applied, thanks
Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
From: Alejandro Lucero [mailto:alejandro.luc...@netronome.com] Sent: Monday, October 29, 2018 8:56 PM To: Thomas Monjalon Cc: Yao, Lei A ; dev ; Xu, Qian Q ; Lin, Xueqin ; Burakov, Anatoly ; Yigit, Ferruh Subject: Re: [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask On Mon, Oct 29, 2018 at 11:46 AM Thomas Monjalon mailto:tho...@monjalon.net>> wrote: 29/10/2018 12:39, Alejandro Lucero: > I got a patch that solves a bug when calling rte_eal_dma_mask using the > mask instead of the maskbits. However, this does not solves the deadlock. The deadlock is a bigger concern I think. I think once the call to rte_eal_check_dma_mask uses the maskbits instead of the mask, calling rte_memseg_walk_thread_unsafe avoids the deadlock. Yao, can you try with the attached patch? Hi, Lucero This patch can fix the issue at my side. Thanks a lot for you quick action. BRs Lei > Interestingly, the problem looks like a compiler one. Calling > rte_memseg_walk does not return when calling inside rt_eal_dma_mask, but if > you modify the call like this: > > - if (rte_memseg_walk(check_iova, &mask)) > + if (!rte_memseg_walk(check_iova, &mask)) > > it works, although the value returned to the invoker changes, of course. > But the point here is it should be the same behaviour when calling > rte_memseg_walk than before and it is not. Anyway, the coding style requires to save the return value in a variable, instead of nesting the call in an "if" condition. And the "if" check should be explicitly != 0 because it is not a real boolean. PS: please do not top post and avoid HTML emails, thanks
Re: [dpdk-dev] [PATCH] mem: fix alignment of requested virtual areas
> -Original Message- > From: Burakov, Anatoly > Sent: Monday, July 16, 2018 3:57 PM > To: dev@dpdk.org > Cc: tho...@monjalon.net; Yao, Lei A ; Stojaczyk, > DariuszX ; sta...@dpdk.org > Subject: [PATCH] mem: fix alignment of requested virtual areas > > The original code did not align any addresses that were requested as > page-aligned, but were different because addr_is_hint was set. > > Below fix by Dariusz has introduced an issue where all unaligned addresses > were left as unaligned. > > This patch is a partial revert of > commit 7fa7216ed48d ("mem: fix alignment of requested virtual areas") > > and implements a proper fix for this issue, by asking for alignment in all > but the following two cases: > > 1) page size is equal to system page size, or > 2) we got an aligned requested address, and will not accept a different one > > This ensures that alignment is performed in all cases, except for those we > can guarantee that the address will not need alignment. > > Fixes: b7cc54187ea4 ("mem: move virtual area function in common directory") > Fixes: 7fa7216ed48d ("mem: fix alignment of requested virtual areas") > Cc: dariuszx.stojac...@intel.com > Cc: sta...@dpdk.org > > Signed-off-by: Anatoly Burakov Tested-by: Lei Yao This patch is passed with following two multi process test 1. quota_watermark sample test( qw as primary, qwctl as secondary) 2. Testpmd with NIC and vdev(primary)+ dpdk-procinfo(secondary) > --- > lib/librte_eal/common/eal_common_memory.c | 15 +-- > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/lib/librte_eal/common/eal_common_memory.c > b/lib/librte_eal/common/eal_common_memory.c > index 659cc08f6..fbfb1b055 100644 > --- a/lib/librte_eal/common/eal_common_memory.c > +++ b/lib/librte_eal/common/eal_common_memory.c > @@ -66,14 +66,17 @@ eal_get_virtual_area(void *requested_addr, size_t > *size, > addr_is_hint = true; > } > > - /* if requested address is not aligned by page size, or if requested > - * address is NULL, add page size to requested length as we may get > an > - * address that's aligned by system page size, which can be smaller > than > - * our requested page size. additionally, we shouldn't try to align if > - * system page size is the same as requested page size. > + /* we don't need alignment of resulting pointer in the following > cases: > + * > + * 1. page size is equal to system size > + * 2. we have a requested address, and it is page-aligned, and we will > + *be discarding the address if we get a different one. > + * > + * for all other cases, alignment is potentially necessary. >*/ > no_align = (requested_addr != NULL && > - ((uintptr_t)requested_addr & (page_sz - 1))) || > + requested_addr == RTE_PTR_ALIGN(requested_addr, > page_sz) && > + !addr_is_hint) || > page_sz == system_page_sz; > > do { > -- > 2.17.1
Re: [dpdk-dev] [PATCH] eal: fix circular dependency in EAL proc type detection
> -Original Message- > From: Burakov, Anatoly > Sent: Wednesday, July 18, 2018 11:54 AM > To: dev@dpdk.org > Cc: Richardson, Bruce ; Xu, Qian Q > ; Yao, Lei A ; Lu, PeipeiX > > Subject: [PATCH] eal: fix circular dependency in EAL proc type detection > > Currently, we need runtime dir to put all of our runtime info in, > including the DPDK shared config. However, we use the shared > config to determine our proc type, and this happens earlier than > we actually create the config dir and thus can know where to > place the config file. > > Fix this by moving runtime dir creation right after the EAL > arguments parsing, but before proc type autodetection. Also, > previously we were creating the config file unconditionally, > even if we specified no_shconf - fix it by only creating > the config file if no_shconf is not set. > > Fixes: adf1d867361c ("eal: move runtime config file to new location") > > Signed-off-by: Anatoly Burakov Tested-by: Yao Lei This patch pass the test with simple_mp sample. The secondary process can recognize itself as "secondary process" even use "--proc-type=auto" parameter. > --- > lib/librte_eal/bsdapp/eal/eal.c | 33 ++- > lib/librte_eal/linuxapp/eal/eal.c | 33 ++- > 2 files changed, 38 insertions(+), 28 deletions(-) > > diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c > index 73cdf07b8..7b399bc9d 100644 > --- a/lib/librte_eal/bsdapp/eal/eal.c > +++ b/lib/librte_eal/bsdapp/eal/eal.c > @@ -286,12 +286,17 @@ eal_proc_type_detect(void) > enum rte_proc_type_t ptype = RTE_PROC_PRIMARY; > const char *pathname = eal_runtime_config_path(); > > - /* if we can open the file but not get a write-lock we are a secondary > - * process. NOTE: if we get a file handle back, we keep that open > - * and don't close it to prevent a race condition between multiple > opens */ > - if (((mem_cfg_fd = open(pathname, O_RDWR)) >= 0) && > - (fcntl(mem_cfg_fd, F_SETLK, &wr_lock) < 0)) > - ptype = RTE_PROC_SECONDARY; > + /* if there no shared config, there can be no secondary processes */ > + if (!internal_config.no_shconf) { > + /* if we can open the file but not get a write-lock we are a > + * secondary process. NOTE: if we get a file handle back, we > + * keep that open and don't close it to prevent a race > condition > + * between multiple opens. > + */ > + if (((mem_cfg_fd = open(pathname, O_RDWR)) >= 0) && > + (fcntl(mem_cfg_fd, F_SETLK, &wr_lock) < 0)) > + ptype = RTE_PROC_SECONDARY; > + } > > RTE_LOG(INFO, EAL, "Auto-detected process type: %s\n", > ptype == RTE_PROC_PRIMARY ? "PRIMARY" : > "SECONDARY"); > @@ -468,6 +473,14 @@ eal_parse_args(int argc, char **argv) > } > } > > + /* create runtime data directory */ > + if (internal_config.no_shconf == 0 && > + eal_create_runtime_dir() < 0) { > + RTE_LOG(ERR, EAL, "Cannot create runtime directory\n"); > + ret = -1; > + goto out; > + } > + > if (eal_adjust_config(&internal_config) != 0) { > ret = -1; > goto out; > @@ -600,14 +613,6 @@ rte_eal_init(int argc, char **argv) > return -1; > } > > - /* create runtime data directory */ > - if (internal_config.no_shconf == 0 && > - eal_create_runtime_dir() < 0) { > - rte_eal_init_alert("Cannot create runtime directory\n"); > - rte_errno = EACCES; > - return -1; > - } > - > /* FreeBSD always uses legacy memory model */ > internal_config.legacy_mem = true; > > diff --git a/lib/librte_eal/linuxapp/eal/eal.c > b/lib/librte_eal/linuxapp/eal/eal.c > index d75ae9dae..d2d5aae80 100644 > --- a/lib/librte_eal/linuxapp/eal/eal.c > +++ b/lib/librte_eal/linuxapp/eal/eal.c > @@ -344,12 +344,17 @@ eal_proc_type_detect(void) > enum rte_proc_type_t ptype = RTE_PROC_PRIMARY; > const char *pathname = eal_runtime_config_path(); > > - /* if we can open the file but not get a write-lock we are a secondary > - * process. NOTE: if we get a file handle back, we keep that open > - * and don't close it to prevent a race condition between multiple > opens */ > - if (((mem_cfg_fd = open(pathname, O_RD
Re: [dpdk-dev] [PATCH] net/ixgbe: fix missing suppport of multi-segs offloading
> -Original Message- > From: Dai, Wei > Sent: Tuesday, April 17, 2018 3:44 PM > To: Lu, Wenzhuo ; Ananyev, Konstantin > ; Yao, Lei A > Cc: dev@dpdk.org; Dai, Wei ; sta...@dpdk.org > Subject: [PATCH] net/ixgbe: fix missing suppport of multi-segs offloading > > This patch adds missing supported Tx multi-segs offloading. > > Fixes: 51215925a32f ("net/ixgbe: convert to new Tx offloads API") > Cc: sta...@dpdk.org > > Signed-off-by: Wei Dai Tested-by: Lei Yao This patch can fix the vhost-sample launch issue, the virtio VM2VM Iperf test can pass with this patch on ixgbe NIC. > --- > drivers/net/ixgbe/ixgbe_rxtx.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c > index 7511e18..aed3f5a 100644 > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > @@ -2429,7 +2429,8 @@ ixgbe_get_tx_port_offloads(struct rte_eth_dev > *dev) > DEV_TX_OFFLOAD_UDP_CKSUM | > DEV_TX_OFFLOAD_TCP_CKSUM | > DEV_TX_OFFLOAD_SCTP_CKSUM | > - DEV_TX_OFFLOAD_TCP_TSO; > + DEV_TX_OFFLOAD_TCP_TSO | > + DEV_TX_OFFLOAD_MULTI_SEGS; > > if (hw->mac.type == ixgbe_mac_82599EB || > hw->mac.type == ixgbe_mac_X540) > -- > 2.7.5
Re: [dpdk-dev] [PATCH 00/12] Vhost: CVE-2018-1059 fixes
Hi, Maxime During the 18.05-rc1 performance testing, I find this patch set will bring slightly performance drop on mergeable and normal path, and big performance drop on vector path. Could you have a check on this? I know this patch is important for security. Not sure if there is any way to improve the performance. Mergebale packet size 64 0.80% 128 -2.75% 260 -2.93% 520 -2.72% 1024-1.18% 1500-0.65% Normal packet size 64 -1.47% 128 -7.43% 260 -3.66% 520 -2.52% 1024-1.19% 1500-0.78% Vector packet size 64 -8.60% 128 -3.54% 260 -2.63% 520 -6.12% 1024-1.05% 1500-1.20% CPU info: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz OS: Ubuntu 16.04 BRs Lei > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime Coquelin > Sent: Monday, April 23, 2018 11:58 PM > To: dev@dpdk.org > Cc: Maxime Coquelin > Subject: [dpdk-dev] [PATCH 00/12] Vhost: CVE-2018-1059 fixes > > This series fixes the security vulnerability referenced > as CVE-2018-1059. > > Patches are already applied to the branch, but reviews > are encouraged. Any issues spotted would be fixed on top. > > Maxime Coquelin (12): > vhost: fix indirect descriptors table translation size > vhost: check all range is mapped when translating GPAs > vhost: introduce safe API for GPA translation > vhost: ensure all range is mapped when translating QVAs > vhost: add support for non-contiguous indirect descs tables > vhost: handle virtually non-contiguous buffers in Tx > vhost: handle virtually non-contiguous buffers in Rx > vhost: handle virtually non-contiguous buffers in Rx-mrg > examples/vhost: move to safe GPA translation API > examples/vhost_scsi: move to safe GPA translation API > vhost/crypto: move to safe GPA translation API > vhost: deprecate unsafe GPA translation API > > examples/vhost/virtio_net.c| 94 +++- > examples/vhost_scsi/vhost_scsi.c | 56 - > lib/librte_vhost/rte_vhost.h | 46 > lib/librte_vhost/rte_vhost_version.map | 4 +- > lib/librte_vhost/vhost.c | 39 ++-- > lib/librte_vhost/vhost.h | 8 +- > lib/librte_vhost/vhost_crypto.c| 65 -- > lib/librte_vhost/vhost_user.c | 58 +++-- > lib/librte_vhost/virtio_net.c | 411 - > > 9 files changed, 650 insertions(+), 131 deletions(-) > > -- > 2.14.3
Re: [dpdk-dev] [PATCH 00/12] Vhost: CVE-2018-1059 fixes
> -Original Message- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Wednesday, May 2, 2018 5:20 PM > To: Yao, Lei A ; dev@dpdk.org > Cc: Bie, Tiwei > Subject: Re: [dpdk-dev] [PATCH 00/12] Vhost: CVE-2018-1059 fixes > > Hi Lei, > > Thanks for the perf report. > > On 05/02/2018 07:08 AM, Yao, Lei A wrote: > > Hi, Maxime > > > > During the 18.05-rc1 performance testing, I find this patch set will bring > > slightly performance drop on mergeable and normal path, and big > performance > > drop on vector path. Could you have a check on this? I know this patch is > > important for security. Not sure if there is any way to improve the > performance. > > > > Could you please share info about the use cases you are benchmarking? > I run vhost/virtio loopback test . > There may be ways to improve the performance, for this we would need to > profile the code to understand where the bottlenecks are. > > > > Mergebale > > packet size > > 64 0.80% > > 128 -2.75% > > 260 -2.93% > > 520 -2.72% > > 1024-1.18% > > 1500-0.65% > > > > Normal > > packet size > > 64 -1.47% > > 128 -7.43% > > 260 -3.66% > > 520 -2.52% > > 1024-1.19% > > 1500-0.78% > > > > Vector > > packet size > > 64 -8.60% > > 128 -3.54% > > 260 -2.63% > > 520 -6.12% > > 1024-1.05% > > 1500-1.20% > > Are you sure this is only this series that induces such a big > performance drop in vector test? I.e. have you run the benchmark > just before and right after the series is applied? Yes. The performance drop I list here is just compared before and after your patch set. The key patch bring performance drop is this commit " Commit hash: 41333fba5b98945b8051e7b48f8fe47432cdd356" vhost: introduce safe API for GPA translation. Between 18.02 and 18.05-rc1, there are some other performance drop, but not so large. I need more git bisect work to identify. > > Thanks, > Maxime > > CPU info: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > OS: Ubuntu 16.04 > > > > BRs > > Lei > > > >> -Original Message- > >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime > Coquelin > >> Sent: Monday, April 23, 2018 11:58 PM > >> To: dev@dpdk.org > >> Cc: Maxime Coquelin > >> Subject: [dpdk-dev] [PATCH 00/12] Vhost: CVE-2018-1059 fixes > >> > >> This series fixes the security vulnerability referenced > >> as CVE-2018-1059. > >> > >> Patches are already applied to the branch, but reviews > >> are encouraged. Any issues spotted would be fixed on top. > >> > >> Maxime Coquelin (12): > >>vhost: fix indirect descriptors table translation size > >>vhost: check all range is mapped when translating GPAs > >>vhost: introduce safe API for GPA translation > >>vhost: ensure all range is mapped when translating QVAs > >>vhost: add support for non-contiguous indirect descs tables > >>vhost: handle virtually non-contiguous buffers in Tx > >>vhost: handle virtually non-contiguous buffers in Rx > >>vhost: handle virtually non-contiguous buffers in Rx-mrg > >>examples/vhost: move to safe GPA translation API > >>examples/vhost_scsi: move to safe GPA translation API > >>vhost/crypto: move to safe GPA translation API > >>vhost: deprecate unsafe GPA translation API > >> > >> examples/vhost/virtio_net.c| 94 +++- > >> examples/vhost_scsi/vhost_scsi.c | 56 - > >> lib/librte_vhost/rte_vhost.h | 46 > >> lib/librte_vhost/rte_vhost_version.map | 4 +- > >> lib/librte_vhost/vhost.c | 39 ++-- > >> lib/librte_vhost/vhost.h | 8 +- > >> lib/librte_vhost/vhost_crypto.c| 65 -- > >> lib/librte_vhost/vhost_user.c | 58 +++-- > >> lib/librte_vhost/virtio_net.c | 411 > - > >> > >> 9 files changed, 650 insertions(+), 131 deletions(-) > >> > >> -- > >> 2.14.3 > >
Re: [dpdk-dev] [PATCH 2/2] net/vhost: insert/strip VLAN header in software
Hi, Jan For this patch, I find it will break the VM2VM Iperf test as it clear the of_flags. "bufs[i]->ol_flags = 0;" Could you have a check on this? The test step to reproduce this issue: 1. Lauch testpmd with two vhost-user port, using IO fwd mode testpmd -c 0xe -n 4 --socket-mem 1024,1024 \ --legacy-mem --vdev 'eth_vhost0,iface=vhost-net,queues=1' \ --vdev 'eth_vhost1,iface=vhost-net1,queues=1' --nb-cores=1 \ --tx-offloads=0x802a 2. Lauch 2 vms with virtio deivce 3. In the 2 vms, up the virtio device and run Iperf test between VMs The test result show that the Iperf traffic is broken but ping test can pass. > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Chas Williams > Sent: Friday, March 30, 2018 12:05 AM > To: dev@dpdk.org > Cc: mtetsu...@gmail.com; y...@fridaylinux.org; > maxime.coque...@redhat.com; Jan Blunck > Subject: [dpdk-dev] [PATCH 2/2] net/vhost: insert/strip VLAN header in > software > > From: Jan Blunck > > This lets the vhost driver handle the VLAN header like the virtio driver > in software. > > Signed-off-by: Jan Blunck > --- > drivers/net/vhost/rte_eth_vhost.c | 35 > ++- > 1 file changed, 34 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/vhost/rte_eth_vhost.c > b/drivers/net/vhost/rte_eth_vhost.c > index 453d9bee1..0beb28e94 100644 > --- a/drivers/net/vhost/rte_eth_vhost.c > +++ b/drivers/net/vhost/rte_eth_vhost.c > @@ -119,6 +119,7 @@ struct pmd_internal { > uint16_t max_queues; > rte_atomic32_t started; > int vid; > + uint8_t vlan_strip; > }; > > struct internal_list { > @@ -422,6 +423,12 @@ eth_vhost_rx(void *q, struct rte_mbuf **bufs, > uint16_t nb_bufs) > > for (i = 0; likely(i < nb_rx); i++) { > bufs[i]->port = r->port; > + bufs[i]->ol_flags = 0; > + bufs[i]->vlan_tci = 0; > + > + if (r->internal->vlan_strip) > + rte_vlan_strip(bufs[i]); > + > r->stats.bytes += bufs[i]->pkt_len; > } > > @@ -438,7 +445,7 @@ eth_vhost_tx(void *q, struct rte_mbuf **bufs, > uint16_t nb_bufs) > { > struct vhost_queue *r = q; > uint16_t i, nb_tx = 0; > - uint16_t nb_send = nb_bufs; > + uint16_t nb_send = 0; > > if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0)) > return 0; > @@ -448,6 +455,22 @@ eth_vhost_tx(void *q, struct rte_mbuf **bufs, > uint16_t nb_bufs) > if (unlikely(rte_atomic32_read(&r->allow_queuing) == 0)) > goto out; > > + for (i = 0; i < nb_bufs; i++) { > + struct rte_mbuf *m = bufs[i]; > + > + /* Do VLAN tag insertion */ > + if (m->ol_flags & PKT_TX_VLAN_PKT) { > + int error = rte_vlan_insert(&m); > + if (unlikely(error)) { > + rte_pktmbuf_free(m); > + continue; > + } > + } > + > + bufs[nb_send] = m; > + ++nb_send; > + } > + > /* Enqueue packets to guest RX queue */ > while (nb_send) { > uint16_t nb_pkts; > @@ -489,6 +512,16 @@ eth_vhost_tx(void *q, struct rte_mbuf **bufs, > uint16_t nb_bufs) > static int > eth_dev_configure(struct rte_eth_dev *dev __rte_unused) > { > + struct pmd_internal *internal = dev->data->dev_private; > + const struct rte_eth_rxmode *rxmode = &dev->data- > >dev_conf.rxmode; > + > + internal->vlan_strip = rxmode->hw_vlan_strip; > + > + if (rxmode->hw_vlan_filter) > + RTE_LOG(WARNING, PMD, > + "vhost(%s): vlan filtering not available\n", > + internal->dev_name); > + > return 0; > } > > -- > 2.13.6
Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_power: traffic pattern aware power control
> -Original Message- > From: Ma, Liang J > Sent: Friday, August 31, 2018 11:04 PM > To: Hunt, David > Cc: dev@dpdk.org; Yao, Lei A ; Nicolau, Radu > ; Burakov, Anatoly ; > Geary, John ; Ma, Liang J > Subject: [PATCH v6 1/4] lib/librte_power: traffic pattern aware power control > > 1. Abstract > > For packet processing workloads such as DPDK polling is continuous. > This means CPU cores always show 100% busy independent of how much > work > those cores are doing. It is critical to accurately determine how busy > a core is hugely important for the following reasons: > >* No indication of overload conditions > >* User do not know how much real load is on a system meaning resulted in > wasted energy as no power management is utilized > > Compared to the original l3fwd-power design, instead of going to sleep > after detecting an empty poll, the new mechanism just lowers the core > frequency. As a result, the application does not stop polling the device, > which leads to improved handling of bursts of traffic. > > When the system become busy, the empty poll mechanism can also increase > the core > frequency (including turbo) to do best effort for intensive traffic. This > gives > us more flexible and balanced traffic awareness over the standard l3fwd- > power > application. > > 2. Proposed solution > > The proposed solution focuses on how many times empty polls are executed. > The less > the number of empty polls, means current core is busy with processing > workload, > therefore, the higher frequency is needed. The high empty poll number > indicates > the current core not doing any real work therefore, we can lower the > frequency > to safe power. > > In the current implementation, each core has 1 empty-poll counter which > assume > 1 core is dedicated to 1 queue. This will need to be expanded in the future > to support multiple queues per core. > > 2.1 Power state definition: > > LOW: Not currently used, reserved for future use. > > MED: the frequency is used to process modest traffic workload. > > HIGH: the frequency is used to process busy traffic workload. > > 2.2 There are two phases to establish the power management system: > > a.Initialization/Training phase. The training phase is necessary > in order to figure out the system polling baseline numbers from > idle to busy. The highest poll count will be during idle, where all > polls are empty. These poll counts will be different between > systems due to the many possible processor micro-arch, cache > and device configurations, hence the training phase. > In the training phase, traffic is blocked so the training algorithm > can average the empty-poll numbers for the LOW, MED and > HIGH power states in order to create a baseline. > The core's counter are collected every 10ms, and the Training > phase will take 2 seconds. > > b.Normal phase. When the training phase is complete, traffic is > started. The run-time poll counts are compared with the > baseline and the decision will be taken to move to MED power > state or HIGH power state. The counters are calculated every 10ms. > > 3. Proposed API > > 1. rte_power_empty_poll_stat_init(void); > which is used to initialize the power management system. > > 2. rte_power_empty_poll_stat_free(void); > which is used to free the resource hold by power management system. > > 3. rte_power_empty_poll_stat_update(unsigned int lcore_id); > which is used to update specific core empty poll counter, not thread safe > > 4. rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt); > which is used to update specific core valid poll counter, not thread safe > > 5. rte_power_empty_poll_stat_fetch(unsigned int lcore_id); > which is used to get specific core empty poll counter. > > 6. rte_power_poll_stat_fetch(unsigned int lcore_id); > which is used to get specific core valid poll counter. > > 7. rte_empty_poll_detection(void); > which is used to detect empty poll state changes. > > ChangeLog: > v2: fix some coding style issues > v3: rename the filename, API name. > v4: no change > v5: no change > v6: re-work the code layout, update API > > Signed-off-by: Liang Ma > --- > lib/librte_power/Makefile | 6 +- > lib/librte_power/meson.build| 5 +- > lib/librte_power/rte_power_empty_poll.c | 500 > > lib/librte_power/rte_power_empty_poll.h | 205 + > lib/librte_power/rte_power_version.map | 13 + > 5
Re: [dpdk-dev] [PATCH v6 1/4] lib/librte_power: traffic pattern aware power control
> -Original Message- > From: Ma, Liang J > Sent: Friday, August 31, 2018 11:04 PM > To: Hunt, David > Cc: dev@dpdk.org; Yao, Lei A ; Nicolau, Radu > ; Burakov, Anatoly ; > Geary, John ; Ma, Liang J > Subject: [PATCH v6 1/4] lib/librte_power: traffic pattern aware power control > > 1. Abstract > > For packet processing workloads such as DPDK polling is continuous. > This means CPU cores always show 100% busy independent of how much > work > those cores are doing. It is critical to accurately determine how busy > a core is hugely important for the following reasons: > >* No indication of overload conditions > >* User do not know how much real load is on a system meaning resulted in > wasted energy as no power management is utilized > > Compared to the original l3fwd-power design, instead of going to sleep > after detecting an empty poll, the new mechanism just lowers the core > frequency. As a result, the application does not stop polling the device, > which leads to improved handling of bursts of traffic. > > When the system become busy, the empty poll mechanism can also increase > the core > frequency (including turbo) to do best effort for intensive traffic. This > gives > us more flexible and balanced traffic awareness over the standard l3fwd- > power > application. > > 2. Proposed solution > > The proposed solution focuses on how many times empty polls are executed. > The less > the number of empty polls, means current core is busy with processing > workload, > therefore, the higher frequency is needed. The high empty poll number > indicates > the current core not doing any real work therefore, we can lower the > frequency > to safe power. > > In the current implementation, each core has 1 empty-poll counter which > assume > 1 core is dedicated to 1 queue. This will need to be expanded in the future > to support multiple queues per core. > > 2.1 Power state definition: > > LOW: Not currently used, reserved for future use. > > MED: the frequency is used to process modest traffic workload. > > HIGH: the frequency is used to process busy traffic workload. > > 2.2 There are two phases to establish the power management system: > > a.Initialization/Training phase. The training phase is necessary > in order to figure out the system polling baseline numbers from > idle to busy. The highest poll count will be during idle, where all > polls are empty. These poll counts will be different between > systems due to the many possible processor micro-arch, cache > and device configurations, hence the training phase. > In the training phase, traffic is blocked so the training algorithm > can average the empty-poll numbers for the LOW, MED and > HIGH power states in order to create a baseline. > The core's counter are collected every 10ms, and the Training > phase will take 2 seconds. > > b.Normal phase. When the training phase is complete, traffic is > started. The run-time poll counts are compared with the > baseline and the decision will be taken to move to MED power > state or HIGH power state. The counters are calculated every 10ms. > > 3. Proposed API > > 1. rte_power_empty_poll_stat_init(void); > which is used to initialize the power management system. > > 2. rte_power_empty_poll_stat_free(void); > which is used to free the resource hold by power management system. > > 3. rte_power_empty_poll_stat_update(unsigned int lcore_id); > which is used to update specific core empty poll counter, not thread safe > > 4. rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt); > which is used to update specific core valid poll counter, not thread safe > > 5. rte_power_empty_poll_stat_fetch(unsigned int lcore_id); > which is used to get specific core empty poll counter. > > 6. rte_power_poll_stat_fetch(unsigned int lcore_id); > which is used to get specific core valid poll counter. > > 7. rte_empty_poll_detection(void); > which is used to detect empty poll state changes. > > ChangeLog: > v2: fix some coding style issues > v3: rename the filename, API name. > v4: no change > v5: no change > v6: re-work the code layout, update API > > Signed-off-by: Liang Ma Reviewed-by: Lei Yao > --- > lib/librte_power/Makefile | 6 +- > lib/librte_power/meson.build| 5 +- > lib/librte_power/rte_power_empty_poll.c | 500 > > lib/librte_power/rte_power_empty_poll.h | 205 + > lib/librte_power/rte_power_versi
Re: [dpdk-dev] [PATCH v1 6/7] doc/vm_power_manager: add JSON interface API info
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of David Hunt > Sent: Thursday, August 30, 2018 6:54 PM > To: dev@dpdk.org > Cc: Mcnamara, John ; Hunt, David > > Subject: [dpdk-dev] [PATCH v1 6/7] doc/vm_power_manager: add JSON > interface API info > > Signed-off-by: David Hunt > --- > .../sample_app_ug/vm_power_management.rst | 195 > ++ > 1 file changed, 195 insertions(+) > > diff --git a/doc/guides/sample_app_ug/vm_power_management.rst > b/doc/guides/sample_app_ug/vm_power_management.rst > index 855570d6b..13a325eae 100644 > --- a/doc/guides/sample_app_ug/vm_power_management.rst > +++ b/doc/guides/sample_app_ug/vm_power_management.rst > @@ -337,6 +337,201 @@ monitoring of branch ratio on cores doing busy > polling via PMDs. >and will need to be adjusted for different workloads. > > > + > +JSON API > + > + > +In addition to the command line interface for host command and a virtio- > serial > +interface for VM power policies, there is also a JSON interface through > which > +power commands and policies can be sent. Sending a command or policy to > the > +power manager application is achieved by simply opening a fifo file, writing > +a JSON string to that fifo, and closing the file. > + > +The fifo is at /tmp/powermonitor/fifo.0 > + > +The jason string can be a policy or instruction, and takes the following > +format: > + > + .. code-block:: console > + > +{"packet_type": { > + "pair_1": value, > + "pair_2": value > +}} > + > +The 'packet_type' header can contain one of two values, depending on > +whether a policy or power command is being sent. The two possible values > are > +"policy" and "instruction", and the expected name-value pairs is different > +depending on which type is being sent. > + > +The pairs are the format of standard JSON name-value pairs. The value type > +varies between the different name/value pairs, and may be intgers, strings, > +arrays, etc. Examples of policies follow later in this document. The allowed > +names and value types are as follows: > + > + > +:Pair Name: "name" > +:Description: Name of the VM or Host. Allows the parser to associate the > + policy with the relevant VM or Host OS. > +:Type: string > +:Values: any valid string > +:Required: yes > +:Example: > + > + .. code-block:: console > + > +""name", "ubuntu2" > + > + > +:Pair Name: "command" > +:Description: The type of packet we're sending to the power manager. We > can be > + creating or destroying a policy, or sending a direct command to adjust > + the frequency of a core, similar to the command line interface. > +:Type: string > +:Values: > + > + :"CREATE": used when creating a new policy, > + :"DESTROY": used when removing a policy, > + :"POWER": used when sending an immediate command, max, min, etc. > +:Required: yes > +:Example: > + > +.. code-block:: console > + > + "command", "CREATE" > + > + > +:Pair Name: "policy_type" > +:Description: Type of policy to apply. Please see vm_power_manager > documentation > + for more information on the types of policies that may be used. > +:Type: string > +:Values: > + > + :"TIME": Time-of-day policy. Frequencies of the relevant cores are > +scaled up/down depending on busy and quiet hours. > + :"TRAFFIC": This policy takes statistics from the NIC and scales up > +and down accordingly. > + :"WORKLOAD": This policy looks at how heavily loaded the cores are, > +and scales up and down accordingly. > + :"BRANCH_RATIO": This out-of-band policy can look at the ratio between > +branch hits and misses on a core, and is useful for detecting > +how much packet processing a core is doing. > +:Required: only for CREATE/DESTROY command > +:Example: > + > + .. code-block:: console > + > +"policy_type", "TIME" > + > +:Pair Name: "busy_hours" > +:Description: The hours of the day in which we scale up the cores for busy > + times. > +:Type: array of integers > +:Values: array with list of hour numbers, (0-23) > +:Required: only for TIME policy > +:Example: > + > + .. code-block:: console > + > +"busy_hours":[ 17, 18, 19, 20, 21, 22, 23 ] > + > +:Pair Name: "quiet_hours" > +:Description: The hours of the day in which we scale down the cores for > quiet > + times. > +:Type: array of integers > +:Values: array with list of hour numbers, (0-23) > +:Required: only for TIME policy > +:Example: > + > + .. code-block:: console > + > +"quiet_hours":[ 2, 3, 4, 5, 6 ] > + Do you think we need document following three key here? min_packet_thresh avg_packet_thresh max_packet_thresh I see them in the code but not documented. > +:Pair Name: "core_list" > +:Description: The cores to which to apply the policy. > +:Type: array of integers > +:Values: array with list of virtual CPUs. > +:Required: only policy CREATE/DESTROY > +:Example: > + > + .. code-block:: console > + > +"core_list":[ 10, 11 ] > + > +:Pair Name: "unit" > +:Description: the type o
Re: [dpdk-dev] [PATCH v1 4/7] examples/power: add host channel to power manager
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of David Hunt > Sent: Thursday, August 30, 2018 6:54 PM > To: dev@dpdk.org > Cc: Mcnamara, John ; Hunt, David > > Subject: [dpdk-dev] [PATCH v1 4/7] examples/power: add host channel to > power manager > > This patch adds a fifo channel to the vm_power_manager app through which > we can send commands and polices. Intended for sending JSON strings. > The fifo is at /tmp/powermonitor/fifo.0 > > Signed-off-by: David Hunt > --- > examples/vm_power_manager/channel_manager.c | 108 > +++ > examples/vm_power_manager/channel_manager.h | 17 ++- > examples/vm_power_manager/channel_monitor.c | 146 > +++- > examples/vm_power_manager/main.c| 2 + > 4 files changed, 238 insertions(+), 35 deletions(-) > > diff --git a/examples/vm_power_manager/channel_manager.c > b/examples/vm_power_manager/channel_manager.c > index 2bb8641d3..bcd106be1 100644 > --- a/examples/vm_power_manager/channel_manager.c > +++ b/examples/vm_power_manager/channel_manager.c > @@ -13,6 +13,7 @@ > > #include > #include > +#include > #include > #include > > @@ -284,6 +285,38 @@ open_non_blocking_channel(struct channel_info > *info) > return 0; > } > > +static int > +open_host_channel(struct channel_info *info) > +{ > + int flags; > + > + info->fd = open(info->channel_path, O_RDWR | O_RSYNC); > + if (info->fd == -1) { > + RTE_LOG(ERR, CHANNEL_MANAGER, "Error(%s) opening fifo > for '%s'\n", > + strerror(errno), > + info->channel_path); > + return -1; > + } > + > + /* Get current flags */ > + flags = fcntl(info->fd, F_GETFL, 0); > + if (flags < 0) { > + RTE_LOG(WARNING, CHANNEL_MANAGER, "Error(%s) fcntl > get flags socket for" > + "'%s'\n", strerror(errno), info- > >channel_path); > + return 1; > + } > + /* Set to Non Blocking */ > + flags |= O_NONBLOCK; > + if (fcntl(info->fd, F_SETFL, flags) < 0) { > + RTE_LOG(WARNING, CHANNEL_MANAGER, > + "Error(%s) setting non-blocking " > + "socket for '%s'\n", > + strerror(errno), info->channel_path); > + return -1; > + } > + return 0; > +} > + > static int > setup_channel_info(struct virtual_machine_info **vm_info_dptr, > struct channel_info **chan_info_dptr, unsigned > channel_num) > @@ -294,6 +327,7 @@ setup_channel_info(struct virtual_machine_info > **vm_info_dptr, > chan_info->channel_num = channel_num; > chan_info->priv_info = (void *)vm_info; > chan_info->status = CHANNEL_MGR_CHANNEL_DISCONNECTED; > + chan_info->type = CHANNEL_TYPE_BINARY; > if (open_non_blocking_channel(chan_info) < 0) { > RTE_LOG(ERR, CHANNEL_MANAGER, "Could not open > channel: " > "'%s' for VM '%s'\n", > @@ -316,6 +350,35 @@ setup_channel_info(struct virtual_machine_info > **vm_info_dptr, > return 0; > } > > +static int > +setup_host_channel_info(struct channel_info **chan_info_dptr, > + unsigned int channel_num) > +{ > + struct channel_info *chan_info = *chan_info_dptr; > + > + chan_info->channel_num = channel_num; > + chan_info->priv_info = (void *)0; > + chan_info->status = CHANNEL_MGR_CHANNEL_DISCONNECTED; > + chan_info->type = CHANNEL_TYPE_JSON; > + sprintf(chan_info->channel_path, "%sfifo.0", > CHANNEL_MGR_SOCKET_PATH); > + > + if (open_host_channel(chan_info) < 0) { > + RTE_LOG(ERR, CHANNEL_MANAGER, "Could not open host > channel: " > + "'%s'\n", > + chan_info->channel_path); > + return -1; > + } > + if (add_channel_to_monitor(&chan_info) < 0) { > + RTE_LOG(ERR, CHANNEL_MANAGER, "Could add channel: " > + "'%s' to epoll ctl\n", > + chan_info->channel_path); > + return -1; > + > + } > + chan_info->status = CHANNEL_MGR_CHANNEL_CONNECTED; > + return 0; > +} > + > int > add_all_channels(const char *vm_name) > { > @@ -470,6 +533,51 @@ add_channels(const char *vm_name, unsigned > *channel_list, > return num_channels_enabled; > } > > +int > +add_host_channel(void) > +{ > + struct channel_info *chan_info; > + char socket_path[PATH_MAX]; > + int num_channels_enabled = 0; > + int ret; > + > + snprintf(socket_path, sizeof(socket_path), "%sfifo.%u", > + CHANNEL_MGR_SOCKET_PATH, 0); > + > + errno = 0; > + ret = mkfifo(socket_path, 0666); > + if ((errno != EEXIST) && (ret < 0)) { > + printf(" %d %d, %d\n", ret, EEXIST, errno); > + RTE_LOG(ERR, CHANNEL_MANAGER, "Cannot create fifo '%s' > error: " > +
Re: [dpdk-dev] [PATCH v5 10/10] doc/vm_power_manager: add JSON interface API info
> +:Pair Name: "avg_packet_thresh" > +:Description: Threshold below which the frequency will be set to min for > + the TRAFFIC policy. If the traffic rate is above this and below max, the > + frequency will be set to medium. > +:Type: integer > +:Values: The number of packets below which the TRAFFIC policy applies the > + minimum frequency, or medium frequency if between avg and max > thresholds. > +:Required: only for TRAFFIC policy > +:Example: > + > + .. code-block:: javascript > + > +"avg_packet_thresh": 10 Hi, Dave For this traffic policy , seems in previous release, we depends on the application in VM to send the VF mac address info back to the host vm_power_manager sample in host through virtio-serial port. If this JOSN interface is designed for container using VF, do we need add VF mac or ID related info into the TRAFFIC policy JSON file? Otherwise, the host don't know which port to monitor the throughput, I guess.
Re: [dpdk-dev] [PATCH v2] net/virtio-user: fix feature setting with vhost-net backend
> -Original Message- > From: Hu, Jiayu > Sent: Tuesday, May 8, 2018 4:15 PM > To: dev@dpdk.org > Cc: Bie, Tiwei ; Yang, Zhiyong > ; maxime.coque...@redhat.com; Yao, Lei A > ; Hu, Jiayu > Subject: [PATCH v2] net/virtio-user: fix feature setting with vhost-net > backend > > When the backend is vhost-net, virtio-user must work in client mode and > needs to request features from the backend in virtio_user_dev_init(). > But currently, virtio-user is assigned to default features in this case. > > This patch is to fix this inappropriate feature setting. > > Fixes: bd8f50a45d0f ("net/virtio-user: support server mode") > Signed-off-by: Jiayu Hu Tested-by: Lei Yao Tested this patch based on 18.05-rc2. This patch fixed the vhost-net kernel backend issue with DPDK. Basic test with virtio-user server mode is also pass. > --- > changes in v2: > - remove unnecessary indent change. > - change commit log. > > drivers/net/virtio/virtio_user/virtio_user_dev.c | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c > b/drivers/net/virtio/virtio_user/virtio_user_dev.c > index 38b8bc9..2d80188 100644 > --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c > +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c > @@ -353,7 +353,7 @@ virtio_user_dev_init(struct virtio_user_dev *dev, > char *path, int queues, > return -1; > } > > - if (dev->vhostfd >= 0) { > + if (!dev->is_server) { > if (dev->ops->send_request(dev, > VHOST_USER_SET_OWNER, > NULL) < 0) { > PMD_INIT_LOG(ERR, "set_owner fails: %s", > @@ -367,6 +367,8 @@ virtio_user_dev_init(struct virtio_user_dev *dev, > char *path, int queues, >strerror(errno)); > return -1; > } > + if (dev->mac_specified) > + dev->device_features |= (1ull << > VIRTIO_NET_F_MAC); > } else { > /* We just pretend vhost-user can support all these features. >* Note that this could be problematic that if some feature is > @@ -376,9 +378,6 @@ virtio_user_dev_init(struct virtio_user_dev *dev, > char *path, int queues, > dev->device_features = > VIRTIO_USER_SUPPORTED_FEATURES; > } > > - if (dev->mac_specified) > - dev->device_features |= (1ull << VIRTIO_NET_F_MAC); > - > if (cq) { > /* device does not really need to know anything about CQ, >* so if necessary, we just claim to support CQ > -- > 2.7.4
Re: [dpdk-dev] [PATCH v5 03/16] bus/pci: replace strncpy dangerous code
Hi, Andy This patch will break the vfio-pci driver on my server. I can't launch NIC with vfio-pci using testpmd. Could you have a check on this? Thanks a lot! My server info: OS: Ubuntu 16.04 LTS gcc: 5.4.0 kernel: 4.4.0 CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz NIC: Ethernet Controller X710 for 10GbE SFP+ My Step: 1. Bind NIC to vfio-pci driver modprobe vfio-pci dpdk-devbind.py -b vfio-pci [PCI address of NIC] 2. Launch testpmd; ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x03 -n 4 -- -i > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Andy Green > Sent: Saturday, May 12, 2018 9:48 AM > To: dev@dpdk.org > Subject: [dpdk-dev] [PATCH v5 03/16] bus/pci: replace strncpy dangerous > code > > In function ‘pci_get_kernel_driver_by_path’, > inlined from ‘pci_scan_one.isra.1’ at /home/agreen/projects/dpdk/ > drivers/bus/pci/linux/pci.c:317:8: > /home/agreen/projects/dpdk/drivers/bus/pci/linux/pci.c:57:3: error: > ‘strncpy’ specified bound depends on the length of the source argument > [-Werror=stringop-overflow=] >strncpy(dri_name, name + 1, strlen(name + 1) + 1); > > Signed-off-by: Andy Green > Acked-by: Pablo de Lara > Fixes: d9a8cd9595f2 ("pci: add kernel driver type") > Cc: sta...@dpdk.org > --- > drivers/bus/pci/linux/pci.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c > index 4630a8057..a73ee49c2 100644 > --- a/drivers/bus/pci/linux/pci.c > +++ b/drivers/bus/pci/linux/pci.c > @@ -54,7 +54,7 @@ pci_get_kernel_driver_by_path(const char *filename, > char *dri_name) > > name = strrchr(path, '/'); > if (name) { > - strncpy(dri_name, name + 1, strlen(name + 1) + 1); > + strlcpy(dri_name, name + 1, sizeof(dri_name)); > return 0; > } >
Re: [dpdk-dev] [PATCH v5 03/16] bus/pci: replace strncpy dangerous code
> -Original Message- > From: Andy Green [mailto:a...@warmcat.com] > Sent: Tuesday, May 15, 2018 3:33 PM > To: Yao, Lei A ; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v5 03/16] bus/pci: replace strncpy dangerous > code > > > > On 05/15/2018 02:12 PM, Yao, Lei A wrote: > > Hi, Andy > > > > This patch will break the vfio-pci driver on my server. > > I can't launch NIC with vfio-pci using testpmd. Could you have > > a check on this? Thanks a lot! > > > > My server info: > > OS: Ubuntu 16.04 LTS > > gcc: 5.4.0 > > kernel: 4.4.0 > > CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > NIC: Ethernet Controller X710 for 10GbE SFP+ > > > > My Step: > > 1. Bind NIC to vfio-pci driver > > modprobe vfio-pci > > dpdk-devbind.py -b vfio-pci [PCI address of NIC] > > > > 2. Launch testpmd; > > ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x03 -n 4 -- -i > > I don't have any nic to test with. > > But it doesn't matter the patch is indeed wrong... I just sent you and > the list a fix on top of the incomplete patch. "bus/pci: correct the > earlier strlcpy conversion" > > Sorry... > > -Andy > Hi, Andy Thanks a lot for your quick fix. It can work on my server now. BRs Lei > >> -Original Message- > >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Andy Green > >> Sent: Saturday, May 12, 2018 9:48 AM > >> To: dev@dpdk.org > >> Subject: [dpdk-dev] [PATCH v5 03/16] bus/pci: replace strncpy dangerous > >> code > >> > >> In function ‘pci_get_kernel_driver_by_path’, > >> inlined from ‘pci_scan_one.isra.1’ at /home/agreen/projects/dpdk/ > >>drivers/bus/pci/linux/pci.c:317:8: > >> /home/agreen/projects/dpdk/drivers/bus/pci/linux/pci.c:57:3: error: > >> ‘strncpy’ specified bound depends on the length of the source argument > >> [-Werror=stringop-overflow=] > >> strncpy(dri_name, name + 1, strlen(name + 1) + 1); > >> > >> Signed-off-by: Andy Green > >> Acked-by: Pablo de Lara > >> Fixes: d9a8cd9595f2 ("pci: add kernel driver type") > >> Cc: sta...@dpdk.org > >> --- > >> drivers/bus/pci/linux/pci.c |2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c > >> index 4630a8057..a73ee49c2 100644 > >> --- a/drivers/bus/pci/linux/pci.c > >> +++ b/drivers/bus/pci/linux/pci.c > >> @@ -54,7 +54,7 @@ pci_get_kernel_driver_by_path(const char > *filename, > >> char *dri_name) > >> > >>name = strrchr(path, '/'); > >>if (name) { > >> - strncpy(dri_name, name + 1, strlen(name + 1) + 1); > >> + strlcpy(dri_name, name + 1, sizeof(dri_name)); > >>return 0; > >>} > >> > >
Re: [dpdk-dev] [PATCH 00/12] Vhost: CVE-2018-1059 fixes
Hi, Maxime Any idea for this performance drop? Will we improve it in this release or it will be long term work? Thanks. BRs Lei > -Original Message- > From: Yao, Lei A > Sent: Wednesday, May 2, 2018 8:10 PM > To: Maxime Coquelin ; dev@dpdk.org > Cc: Bie, Tiwei > Subject: RE: [dpdk-dev] [PATCH 00/12] Vhost: CVE-2018-1059 fixes > > > > > -Original Message- > > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > > Sent: Wednesday, May 2, 2018 5:20 PM > > To: Yao, Lei A ; dev@dpdk.org > > Cc: Bie, Tiwei > > Subject: Re: [dpdk-dev] [PATCH 00/12] Vhost: CVE-2018-1059 fixes > > > > Hi Lei, > > > > Thanks for the perf report. > > > > On 05/02/2018 07:08 AM, Yao, Lei A wrote: > > > Hi, Maxime > > > > > > During the 18.05-rc1 performance testing, I find this patch set will bring > > > slightly performance drop on mergeable and normal path, and big > > performance > > > drop on vector path. Could you have a check on this? I know this patch is > > > important for security. Not sure if there is any way to improve the > > performance. > > > > > > > Could you please share info about the use cases you are benchmarking? > > > I run vhost/virtio loopback test . > > There may be ways to improve the performance, for this we would need to > > profile the code to understand where the bottlenecks are. > > > > > > > Mergebale > > > packet size > > > 640.80% > > > 128 -2.75% > > > 260 -2.93% > > > 520 -2.72% > > > 1024 -1.18% > > > 1500 -0.65% > > > > > > Normal > > > packet size > > > 64-1.47% > > > 128 -7.43% > > > 260 -3.66% > > > 520 -2.52% > > > 1024 -1.19% > > > 1500 -0.78% > > > > > > Vector > > > packet size > > > 64-8.60% > > > 128 -3.54% > > > 260 -2.63% > > > 520 -6.12% > > > 1024 -1.05% > > > 1500 -1.20% > > > > Are you sure this is only this series that induces such a big > > performance drop in vector test? I.e. have you run the benchmark > > just before and right after the series is applied? > Yes. The performance drop I list here is just compared before and after your > patch set. The key patch bring performance drop is this commit > " Commit hash:41333fba5b98945b8051e7b48f8fe47432cdd356" > vhost: introduce safe API for GPA translation. > > Between 18.02 and 18.05-rc1, there are some other performance drop, but > not > so large. I need more git bisect work to identify. > > > > > > Thanks, > > Maxime > > > CPU info: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > > > OS: Ubuntu 16.04 > > > > > > BRs > > > Lei > > > > > >> -Original Message- > > >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime > > Coquelin > > >> Sent: Monday, April 23, 2018 11:58 PM > > >> To: dev@dpdk.org > > >> Cc: Maxime Coquelin > > >> Subject: [dpdk-dev] [PATCH 00/12] Vhost: CVE-2018-1059 fixes > > >> > > >> This series fixes the security vulnerability referenced > > >> as CVE-2018-1059. > > >> > > >> Patches are already applied to the branch, but reviews > > >> are encouraged. Any issues spotted would be fixed on top. > > >> > > >> Maxime Coquelin (12): > > >>vhost: fix indirect descriptors table translation size > > >>vhost: check all range is mapped when translating GPAs > > >>vhost: introduce safe API for GPA translation > > >>vhost: ensure all range is mapped when translating QVAs > > >>vhost: add support for non-contiguous indirect descs tables > > >>vhost: handle virtually non-contiguous buffers in Tx > > >>vhost: handle virtually non-contiguous buffers in Rx > > >>vhost: handle virtually non-contiguous buffers in Rx-mrg > > >>examples/vhost: move to safe GPA translation API > > >>examples/vhost_scsi: move to safe GPA translation API > > >>vhost/crypto: move to safe GPA translation API > > >>vhost: deprecate unsafe GPA translation API > > >> > > >> examples/vhost/virtio_net.c| 94 +++- > > >> examples/vhost_scsi/vhost_scsi.c | 56 - > > >> lib/librte_vhost/rte_vhost.h | 46 > > >> lib/librte_vhost/rte_vhost_version.map | 4 +- > > >> lib/librte_vhost/vhost.c | 39 ++-- > > >> lib/librte_vhost/vhost.h | 8 +- > > >> lib/librte_vhost/vhost_crypto.c| 65 -- > > >> lib/librte_vhost/vhost_user.c | 58 +++-- > > >> lib/librte_vhost/virtio_net.c | 411 > > - > > >> > > >> 9 files changed, 650 insertions(+), 131 deletions(-) > > >> > > >> -- > > >> 2.14.3 > > >
Re: [dpdk-dev] [PATCH v2] app/testpmd: fix pmd_test_exit function for vdevs
> -Original Message- > From: Yang, Zhiyong > Sent: Friday, May 18, 2018 6:00 PM > To: dev@dpdk.org > Cc: maxime.coque...@redhat.com; Yigit, Ferruh ; > Bie, Tiwei ; Yao, Lei A ; > Iremonger, Bernard ; sta...@dpdk.org; > Yang, Zhiyong > Subject: [PATCH v2] app/testpmd: fix pmd_test_exit function for vdevs > > For vdev, just calling rte_eth_dev_close() isn't enough to free all > the resources allocated during device probe, e.g. for virtio-user, > virtio_user_pmd_remove(), i.e. the remove() method of a vdev driver, > needs to be called to unlink the socket file created during device > probe. So this patch calls the rte_eth_dev_detach() for vdev when > quitting testpmd. > > Cc: maxime.coque...@redhat.com > Cc: ferruh.yi...@intel.com > Cc: tiwei@intel.com > Cc: lei.a@intel.com > Cc: bernard.iremon...@intel.com > Cc: sta...@dpdk.org > > Fixes: af75078fece3 ("first public release") > Fixes: bd8f50a45d0f ("net/virtio-user: support server mode") > > Signed-off-by: Zhiyong Yang Tested-by: Lei Yao This patch pass the test for virtio-user server mode. The socket file can be deleted after quit testpmd. > --- > > changes in V2: > 1. change the pache title and add a fixes line. > > app/test-pmd/testpmd.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c > index 134401603..1d308f056 100644 > --- a/app/test-pmd/testpmd.c > +++ b/app/test-pmd/testpmd.c > @@ -2011,6 +2011,8 @@ detach_port(portid_t port_id) > void > pmd_test_exit(void) > { > + const struct rte_bus *bus; > + struct rte_device *device; > portid_t pt_id; > int ret; > > @@ -2020,10 +2022,14 @@ pmd_test_exit(void) > if (ports != NULL) { > no_link_check = 1; > RTE_ETH_FOREACH_DEV(pt_id) { > + device = rte_eth_devices[pt_id].device; > + bus = rte_bus_find_by_device(device); > printf("\nShutting down port %d...\n", pt_id); > fflush(stdout); > stop_port(pt_id); > close_port(pt_id); > + if (bus && !strcmp(bus->name, "vdev")) > + detach_port(pt_id); > } > } > > -- > 2.14.3
Re: [dpdk-dev] [PATCH] vhost: adaptively batch small guest memory copies
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jens Freimann > Sent: Monday, August 28, 2017 2:31 PM > To: Bie, Tiwei > Cc: dev@dpdk.org; y...@fridaylinux.org; maxime.coque...@redhat.com; > Wang, Zhihong ; Yang, Zhiyong > > Subject: Re: [dpdk-dev] [PATCH] vhost: adaptively batch small guest memory > copies > > Hi Tiwei, > > On Thu, Aug 24, 2017 at 10:19:39AM +0800, Tiwei Bie wrote: > >This patch adaptively batches the small guest memory copies. > >By batching the small copies, the efficiency of executing the > >memory LOAD instructions can be improved greatly, because the > >memory LOAD latency can be effectively hidden by the pipeline. > >We saw great performance boosts for small packets PVP test. > > this sounds interesting. Do you have numbers showing how much > performance improved for small packets? > > regards, > Jens Hi, Jens On my test bench, the performance gain as following for 64B small packets Mergeable: 19% Vector: 21% No-mergeable: 21% CPU info: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz OS: Ubuntu 16.04
Re: [dpdk-dev] [PATCH v2 3/4] common_base: extend RTE_MAX_ETHPORTS from 32 to 1024
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhiyong Yang > Sent: Monday, September 4, 2017 1:58 PM > To: dev@dpdk.org > Cc: tho...@monjalon.net; Yigit, Ferruh ; Wiles, > Keith ; step...@networkplumber.org; Yang, > Zhiyong > Subject: [dpdk-dev] [PATCH v2 3/4] common_base: extend > RTE_MAX_ETHPORTS from 32 to 1024 > > The reasons to modify RTE_MAX_ETHPORTS is the following. > > 1. RTE_MAX_ETHPORTS=32 by default has not met user's requirements > with development of virtualization technology. Some vdev users have > to modify the setting before the compiling. > > 2. port_id have been extended to 16 bits definition. But for many > samples such as testpmd, l3fwd, num of port is still limited to > RTE_MAX_ETHPORTS=32 by default. This may limit usage of 16 bits > port_id. > > So, it is necessary to enlarge RTE_MAX_ETHPORTS to more than 256. > > Signed-off-by: Zhiyong Yang > --- > config/common_base | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/config/common_base b/config/common_base > index 5e97a08b6..dccc13e31 100644 > --- a/config/common_base > +++ b/config/common_base > @@ -131,7 +131,7 @@ CONFIG_RTE_LIBRTE_KVARGS=y > # > CONFIG_RTE_LIBRTE_ETHER=y > CONFIG_RTE_LIBRTE_ETHDEV_DEBUG=n > -CONFIG_RTE_MAX_ETHPORTS=32 > +CONFIG_RTE_MAX_ETHPORTS=1024 > CONFIG_RTE_MAX_QUEUES_PER_PORT=1024 > CONFIG_RTE_LIBRTE_IEEE1588=n > CONFIG_RTE_ETHDEV_QUEUE_STAT_CNTRS=16 > -- > 2.13.3 Hi, Zhiyong I met one issue for changing CONFIG_RTE_MAX_ETHPORTS to 1024. One process can only open 1024 file as maximum in common linux distribution, after practice, only 1009 socket file can be used for vdev device with testpmd sample.
Re: [dpdk-dev] [PATCH v3] app/testpmd: enable the heavyweight mode TCP/IPv4 GRO
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jiayu Hu > Sent: Sunday, September 3, 2017 2:30 PM > To: dev@dpdk.org > Cc: Yigit, Ferruh ; Ananyev, Konstantin > ; Tan, Jianfeng ; > Wu, Jingjing ; Hu, Jiayu > Subject: [dpdk-dev] [PATCH v3] app/testpmd: enable the heavyweight > mode TCP/IPv4 GRO > > The GRO library provides two modes to reassemble packets. Currently, the > csum forwarding engine has supported to use the lightweight mode to > reassemble TCP/IPv4 packets. This patch introduces the heavyweight mode > for TCP/IPv4 GRO in the csum forwarding engine. > > With the command "set port gro on|off", users can enable > TCP/IPv4 GRO for a given port. With the command "set gro flush ", > users can determine when the GROed TCP/IPv4 packets are flushed from > reassembly tables. With the command "show port gro", users can > display GRO configuration. > > Signed-off-by: Jiayu Hu Tested-by : Lei Yao This patch has been tested on my bench, iperf test result is as following: No-GRO: 8 Gbps Kernel GRO: 14.3 Gbps GRO flush 0 : 12.7 Gbps GRO flush 1: 16.8 Gbps But when I use 40G NIC and set GRO flush cycle as 2, sometimes the iperf traffic will stall for several seconds. Still need investigate. > --- > changes in v3: > - remove "heavyweight mode" and "lightweight mode" from GRO > commands > - combine two patches into one > - use consistent help string for GRO commands > - remove the unnecessary command "gro set (max_flow_num) > (max_item_num_per_flow) (port_id)" > changes in v2: > - use "set" and "show" as the root level command > - add a new command to show GRO configuration > - fix l2_len/l3_len/l4_len unset etc. bugs > > app/test-pmd/cmdline.c | 206 > > app/test-pmd/config.c | 67 +++-- > app/test-pmd/csumonly.c | 31 - > app/test-pmd/testpmd.c | 18 ++- > app/test-pmd/testpmd.h | 16 ++- > doc/guides/testpmd_app_ug/testpmd_funcs.rst | 45 -- > 6 files changed, 263 insertions(+), 120 deletions(-) > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c > index cd8c358..d628250 100644 > --- a/app/test-pmd/cmdline.c > +++ b/app/test-pmd/cmdline.c > @@ -423,13 +423,16 @@ static void cmd_help_long_parsed(void > *parsed_result, > "tso show (portid)" > "Display the status of TCP Segmentation > Offload.\n\n" > > - "gro (on|off) (port_id)" > + "set port (port_id) gro on|off\n" > "Enable or disable Generic Receive Offload in" > " csum forwarding engine.\n\n" > > - "gro set (max_flow_num) > (max_item_num_per_flow) (port_id)\n" > - "Set max flow number and max packet number > per-flow" > - " for GRO.\n\n" > + "show port (port_id) gro\n" > + "Display GRO configuration.\n\n" > + > + "set gro flush (cycles)\n" > + "Set the cycle to flush GROed packets from" > + " reassembly tables.\n\n" > > "set fwd (%s)\n" > "Set packet forwarding mode.\n\n" > @@ -3850,115 +3853,145 @@ cmdline_parse_inst_t cmd_tunnel_tso_show > = { > }; > > /* *** SET GRO FOR A PORT *** */ > -struct cmd_gro_result { > +struct cmd_gro_enable_result { > + cmdline_fixed_string_t cmd_set; > + cmdline_fixed_string_t cmd_port; > cmdline_fixed_string_t cmd_keyword; > - cmdline_fixed_string_t mode; > - uint8_t port_id; > + cmdline_fixed_string_t cmd_onoff; > + uint8_t cmd_pid; > }; > > static void > -cmd_enable_gro_parsed(void *parsed_result, > +cmd_gro_enable_parsed(void *parsed_result, > __attribute__((unused)) struct cmdline *cl, > __attribute__((unused)) void *data) > { > - struct cmd_gro_result *res; > + struct cmd_gro_enable_result *res; > > res = parsed_result; > - setup_gro(res->mode, res->port_id); > -} > - > -cmdline_parse_token_string_t cmd_gro_keyword = > - TOKEN_STRING_INITIALIZER(struct cmd_gro_result, > + if (!strcmp(res->cmd_keyword, "gro")) > + setup_gro(res->cmd_onoff, res->cmd_pid); > +} > +
Re: [dpdk-dev] [PATCH v4 2/5] gso: add TCP/IPv4 GSO support
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jiayu Hu > Sent: Tuesday, September 19, 2017 3:33 PM > To: dev@dpdk.org > Cc: Ananyev, Konstantin ; Kavanagh, Mark > B ; Tan, Jianfeng ; > Yigit, Ferruh ; tho...@monjalon.net; Hu, Jiayu > > Subject: [dpdk-dev] [PATCH v4 2/5] gso: add TCP/IPv4 GSO support > > This patch adds GSO support for TCP/IPv4 packets. Supported packets > may include a single VLAN tag. TCP/IPv4 GSO doesn't check if input > packets have correct checksums, and doesn't update checksums for > output packets (the responsibility for this lies with the application). > Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets. > > TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect > MBUF, to organize an output packet. Note that we refer to these two > chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet > header, while the indirect mbuf simply points to a location within the > original packet's payload. Consequently, use of the GSO library requires > multi-segment MBUF support in the TX functions of the NIC driver. > > If a packet is GSOed, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a > result, when all of its GSOed segments are freed, the packet is freed > automatically. > > TCP/IPv4 GSO clears the PKT_TX_TCP_SEG flag for the input packet and > GSO segments on the event of success. > > Signed-off-by: Jiayu Hu > Signed-off-by: Mark Kavanagh Tested-by: Lei Yao This patch is test on my bench. Iperf result as following: TSO : 18 Gbps DPDK GSO: 10 Gbps No TSO/GSO: 4 Gbps > --- > doc/guides/rel_notes/release_17_11.rst | 12 ++ > lib/librte_eal/common/include/rte_log.h | 1 + > lib/librte_gso/Makefile | 2 + > lib/librte_gso/gso_common.c | 202 > > lib/librte_gso/gso_common.h | 107 + > lib/librte_gso/gso_tcp4.c | 82 + > lib/librte_gso/gso_tcp4.h | 76 > lib/librte_gso/rte_gso.c| 52 +++- > 8 files changed, 531 insertions(+), 3 deletions(-) > create mode 100644 lib/librte_gso/gso_common.c > create mode 100644 lib/librte_gso/gso_common.h > create mode 100644 lib/librte_gso/gso_tcp4.c > create mode 100644 lib/librte_gso/gso_tcp4.h > > diff --git a/doc/guides/rel_notes/release_17_11.rst > b/doc/guides/rel_notes/release_17_11.rst > index 7508be7..7453bb0 100644 > --- a/doc/guides/rel_notes/release_17_11.rst > +++ b/doc/guides/rel_notes/release_17_11.rst > @@ -41,6 +41,18 @@ New Features > Also, make sure to start the actual text at the margin. > > = > > +* **Added the Generic Segmentation Offload Library.** > + > + Added the Generic Segmentation Offload (GSO) library to enable > + applications to split large packets (e.g. MSS is 64KB) into small > + ones (e.g. MTU is 1500B). Supported packet types are: > + > + * TCP/IPv4 packets, which may include a single VLAN tag. > + > + The GSO library doesn't check if the input packets have correct > + checksums, and doesn't update checksums for output packets. > + Additionally, the GSO library doesn't process IP fragmented packets. > + > > Resolved Issues > --- > diff --git a/lib/librte_eal/common/include/rte_log.h > b/lib/librte_eal/common/include/rte_log.h > index ec8dba7..2fa1199 100644 > --- a/lib/librte_eal/common/include/rte_log.h > +++ b/lib/librte_eal/common/include/rte_log.h > @@ -87,6 +87,7 @@ extern struct rte_logs rte_logs; > #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */ > #define RTE_LOGTYPE_EFD 18 /**< Log related to EFD. */ > #define RTE_LOGTYPE_EVENTDEV 19 /**< Log related to eventdev. */ > +#define RTE_LOGTYPE_GSO 20 /**< Log related to GSO. */ > > /* these log types can be used in an application */ > #define RTE_LOGTYPE_USER1 24 /**< User-defined log type 1. */ > diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile > index aeaacbc..2be64d1 100644 > --- a/lib/librte_gso/Makefile > +++ b/lib/librte_gso/Makefile > @@ -42,6 +42,8 @@ LIBABIVER := 1 > > #source files > SRCS-$(CONFIG_RTE_LIBRTE_GSO) += rte_gso.c > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_common.c > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) += gso_tcp4.c > > # install this header file > SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include += rte_gso.h > diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c > new file mode 100644 > index 000..b2c84f6 > --- /dev/null > +++ b/lib/librte_gso/gso_common.c > @@ -0,0 +1,202 @@
Re: [dpdk-dev] [PATCH v4] app/testpmd: enable the heavyweight mode TCP/IPv4 GRO
> -Original Message- > From: Hu, Jiayu > Sent: Tuesday, September 26, 2017 2:27 PM > To: dev@dpdk.org > Cc: Yigit, Ferruh ; Tan, Jianfeng > ; Ananyev, Konstantin > ; tho...@monjalon.net; Wu, Jingjing > ; Yao, Lei A ; Hu, Jiayu > > Subject: [PATCH v4] app/testpmd: enable the heavyweight mode TCP/IPv4 > GRO > > The GRO library provides two modes to reassemble packets. Currently, the > csum forwarding engine has supported to use the lightweight mode to > reassemble TCP/IPv4 packets. This patch introduces the heavyweight mode > for TCP/IPv4 GRO in the csum forwarding engine. > > With the command "set port gro on|off", users can enable > TCP/IPv4 GRO for a given port. With the command "set gro flush ", > users can determine when the GROed TCP/IPv4 packets are flushed from > reassembly tables. With the command "show port gro", users can > display GRO configuration. > > The GRO library doesn't re-calculate checksums for merged packets. If > users want the merged packets to have correct IP and TCP checksums, > please select HW IP checksum calculation and HW TCP checksum calculation > for the port which the merged packets are transmitted to. > > Signed-off-by: Jiayu Hu > Reviewed-by: Ferruh Yigit Tested-by: Yao Lei This patch has beed tested on my bench. The following is the performance data got from iperf test with single flow No GRO: 9.5 Gbps Kernel GRO: 13.6 Gbps DPDK GRO with flush cycle=1 : 25.9 Gbps DPDK GRO with flush cycle=2 : 27.9 Gbps Note: When use DPDK GRO with flush cycle=2, I set the vhost rx_queue_size to 1024, if use default number 256, sometimes I met the date stall. OS: Ubuntu 16.04 CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz > --- > changes in v4: > - fix unchecking the min value of 'cycle' bug in setup_gro_flush_cycles > - update the context of the testpmd document and commit logs > changes in v3: > - remove "heavyweight mode" and "lightweight mode" from GRO > commands > - combine two patches into one > - use consistent help string for GRO commands > - remove the unnecessary command "gro set (max_flow_num) > (max_item_num_per_flow) (port_id)" > changes in v2: > - use "set" and "show" as the root level command > - add a new command to show GRO configuration > - fix l2_len/l3_len/l4_len unset etc. bugs > > app/test-pmd/cmdline.c | 206 > > app/test-pmd/config.c | 68 +++-- > app/test-pmd/csumonly.c | 31 - > app/test-pmd/testpmd.c | 19 ++- > app/test-pmd/testpmd.h | 16 ++- > doc/guides/testpmd_app_ug/testpmd_funcs.rst | 50 +-- > 6 files changed, 270 insertions(+), 120 deletions(-) > > diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c > index ccdf239..e44c02e 100644 > --- a/app/test-pmd/cmdline.c > +++ b/app/test-pmd/cmdline.c > @@ -423,13 +423,16 @@ static void cmd_help_long_parsed(void > *parsed_result, > "tso show (portid)" > "Display the status of TCP Segmentation > Offload.\n\n" > > - "gro (on|off) (port_id)" > + "set port (port_id) gro on|off\n" > "Enable or disable Generic Receive Offload in" > " csum forwarding engine.\n\n" > > - "gro set (max_flow_num) > (max_item_num_per_flow) (port_id)\n" > - "Set max flow number and max packet number > per-flow" > - " for GRO.\n\n" > + "show port (port_id) gro\n" > + "Display GRO configuration.\n\n" > + > + "set gro flush (cycles)\n" > + "Set the cycle to flush GROed packets from" > + " reassembly tables.\n\n" > > "set fwd (%s)\n" > "Set packet forwarding mode.\n\n" > @@ -3854,115 +3857,145 @@ cmdline_parse_inst_t cmd_tunnel_tso_show > = { > }; > > /* *** SET GRO FOR A PORT *** */ > -struct cmd_gro_result { > +struct cmd_gro_enable_result { > + cmdline_fixed_string_t cmd_set; > + cmdline_fixed_string_t cmd_port; > cmdline_fixed_string_t cmd_keyword; > - cmdline_fixed_string_t mode; > - uint8_t port_id; > + cmdline_fixed_string_t cmd_onoff; > + uint8_t cmd_pid; > }; > > static void > -cmd_enable_gro_
Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses translation
Hi, Maxime After this commit, vhost/virtio loopback test will fail when use the CPU on socket 1. Error message like following during initialize: VHOST_CONFIG: vring kick idx:0 file:20 VHOST_CONFIG: reallocate vq from 0 to 1 node VHOST_CONFIG: reallocate dev from 0 to 1 node VHOST_CONFIG: (0) failed to find avail ring address. VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK VHOST_CONFIG: vring kick idx:1 file:21 VHOST_CONFIG: reallocate vq from 0 to 1 node VHOST_CONFIG: (0) failed to find avail ring address. But if use CPU on socket 0. It still can work. Could you have a check on this? Thanks a lot! Following is my cmd: Vhost: testpmd -n 4 -c 0x300 --socket-mem 1024,1024 --file-prefix=vhost \ --vdev 'net_vhost0,iface=vhost-net,queues=1,client=0' -- -i Virtio-user: testpmd -n 4 -c 0xc0 --socket-mem 1024,1024 --no-pci --file-prefix=virtio \ --vdev=net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost-net \ -- -i --txqflags=0xf01 --disable-hw-vlan-filter BRs Lei > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime Coquelin > Sent: Thursday, October 5, 2017 4:36 PM > To: dev@dpdk.org; Horton, Remy ; Bie, Tiwei > ; y...@fridaylinux.org > Cc: m...@redhat.com; jfrei...@redhat.com; vkapl...@redhat.com; > jasow...@redhat.com; Maxime Coquelin > Subject: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses > translation > > This patch postpones rings addresses translations and checks, as > addresses sent by the master shuld not be interpreted as long as > ring is not started and enabled[0]. > > When protocol features aren't negotiated, the ring is started in > enabled state, so the addresses translations are postponed to > vhost_user_set_vring_kick(). > Otherwise, it is postponed to when ring is enabled, in > vhost_user_set_vring_enable(). > > [0]: http://lists.nongnu.org/archive/html/qemu-devel/2017- > 05/msg04355.html > > Signed-off-by: Maxime Coquelin > --- > lib/librte_vhost/vhost.h | 1 + > lib/librte_vhost/vhost_user.c | 69 > ++- > 2 files changed, 56 insertions(+), 14 deletions(-) > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index 79351c66f..903da5db5 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -125,6 +125,7 @@ struct vhost_virtqueue { > > struct vring_used_elem *shadow_used_ring; > uint16_tshadow_used_idx; > + struct vhost_vring_addr ring_addrs; > > struct batch_copy_elem *batch_copy_elems; > uint16_tbatch_copy_nb_elems; > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > index f495dd36e..319867c65 100644 > --- a/lib/librte_vhost/vhost_user.c > +++ b/lib/librte_vhost/vhost_user.c > @@ -356,6 +356,7 @@ static int > vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg) > { > struct vhost_virtqueue *vq; > + struct vhost_vring_addr *addr = &msg->payload.addr; > > if (dev->mem == NULL) > return -1; > @@ -363,35 +364,50 @@ vhost_user_set_vring_addr(struct virtio_net *dev, > VhostUserMsg *msg) > /* addr->index refers to the queue index. The txq 1, rxq is 0. */ > vq = dev->virtqueue[msg->payload.addr.index]; > > + /* > + * Rings addresses should not be interpreted as long as the ring is not > + * started and enabled > + */ > + memcpy(&vq->ring_addrs, addr, sizeof(*addr)); > + > + return 0; > +} > + > +static struct virtio_net *translate_ring_addresses(struct virtio_net *dev, > + int vq_index) > +{ > + struct vhost_virtqueue *vq = dev->virtqueue[vq_index]; > + struct vhost_vring_addr *addr = &vq->ring_addrs; > + > /* The addresses are converted from QEMU virtual to Vhost virtual. > */ > vq->desc = (struct vring_desc *)(uintptr_t)qva_to_vva(dev, > - msg->payload.addr.desc_user_addr); > + addr->desc_user_addr); > if (vq->desc == 0) { > RTE_LOG(ERR, VHOST_CONFIG, > "(%d) failed to find desc ring address.\n", > dev->vid); > - return -1; > + return NULL; > } > > - dev = numa_realloc(dev, msg->payload.addr.index); > - vq = dev->virtqueue[msg->payload.addr.index]; > + dev = numa_realloc(dev, vq_index); >
Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses translation
Hi, Maxime > -Original Message- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Friday, October 13, 2017 3:32 PM > To: Yao, Lei A ; dev@dpdk.org; Horton, Remy > ; Bie, Tiwei ; > y...@fridaylinux.org > Cc: m...@redhat.com; jfrei...@redhat.com; vkapl...@redhat.com; > jasow...@redhat.com > Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses > translation > > Hi Lei, > > On 10/13/2017 03:47 AM, Yao, Lei A wrote: > > Hi, Maxime > > > > After this commit, vhost/virtio loopback test will fail when > > use the CPU on socket 1. > > Error message like following during initialize: > > VHOST_CONFIG: vring kick idx:0 file:20 > > VHOST_CONFIG: reallocate vq from 0 to 1 node > > VHOST_CONFIG: reallocate dev from 0 to 1 node > > VHOST_CONFIG: (0) failed to find avail ring address. > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK > > VHOST_CONFIG: vring kick idx:1 file:21 > > VHOST_CONFIG: reallocate vq from 0 to 1 node > > VHOST_CONFIG: (0) failed to find avail ring address. > > > > But if use CPU on socket 0. It still can work. Could you have a check on > > this? Thanks a lot! > > Thanks for reporting the issue. It seems addr pointer still points to > the old structure after the reallocation. This patch should fix the > issue: > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > index 9acac6125..2416a0061 100644 > --- a/lib/librte_vhost/vhost_user.c > +++ b/lib/librte_vhost/vhost_user.c > @@ -417,6 +417,7 @@ translate_ring_addresses(struct virtio_net *dev, int > vq_index) > > dev = numa_realloc(dev, vq_index); > vq = dev->virtqueue[vq_index]; > + addr = &vq->ring_addrs; > > vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev, > vq, addr->avail_user_addr, sizeof(struct > vring_avail)); > > I only have access to single-socket machines at the moment, so I cannot > reproduce the issue. > Can you have a try? With your new patch, it can work on socket 1 now. Thanks. > > Thanks, > Maxime > > > Following is my cmd: > > Vhost: testpmd -n 4 -c 0x300 --socket-mem 1024,1024 --file- > prefix=vhost \ > > --vdev 'net_vhost0,iface=vhost-net,queues=1,client=0' -- -i > > Virtio-user: testpmd -n 4 -c 0xc0 --socket-mem 1024,1024 --no-pci -- > file-prefix=virtio \ > > > > --vdev=net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost- > net \ > > -- -i --txqflags=0xf01 --disable-hw-vlan-filter > > > > BRs > > Lei > > > >> -Original Message- > >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime > Coquelin > >> Sent: Thursday, October 5, 2017 4:36 PM > >> To: dev@dpdk.org; Horton, Remy ; Bie, Tiwei > >> ; y...@fridaylinux.org > >> Cc: m...@redhat.com; jfrei...@redhat.com; vkapl...@redhat.com; > >> jasow...@redhat.com; Maxime Coquelin > > >> Subject: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses > >> translation > >> > >> This patch postpones rings addresses translations and checks, as > >> addresses sent by the master shuld not be interpreted as long as > >> ring is not started and enabled[0]. > >> > >> When protocol features aren't negotiated, the ring is started in > >> enabled state, so the addresses translations are postponed to > >> vhost_user_set_vring_kick(). > >> Otherwise, it is postponed to when ring is enabled, in > >> vhost_user_set_vring_enable(). > >> > >> [0]: http://lists.nongnu.org/archive/html/qemu-devel/2017- > >> 05/msg04355.html > >> > >> Signed-off-by: Maxime Coquelin > >> --- > >> lib/librte_vhost/vhost.h | 1 + > >> lib/librte_vhost/vhost_user.c | 69 > >> ++- > >> 2 files changed, 56 insertions(+), 14 deletions(-) > >> > >> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > >> index 79351c66f..903da5db5 100644 > >> --- a/lib/librte_vhost/vhost.h > >> +++ b/lib/librte_vhost/vhost.h > >> @@ -125,6 +125,7 @@ struct vhost_virtqueue { > >> > >>struct vring_used_elem *shadow_used_ring; > >>uint16_tshadow_used_idx; >
Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses translation
Hi, Maxime > -Original Message- > From: Yao, Lei A > Sent: Friday, October 13, 2017 3:55 PM > To: Maxime Coquelin ; dev@dpdk.org; > Horton, Remy ; Bie, Tiwei ; > y...@fridaylinux.org > Cc: m...@redhat.com; jfrei...@redhat.com; vkapl...@redhat.com; > jasow...@redhat.com > Subject: RE: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses > translation > > Hi, Maxime > > > -Original Message- > > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > > Sent: Friday, October 13, 2017 3:32 PM > > To: Yao, Lei A ; dev@dpdk.org; Horton, Remy > > ; Bie, Tiwei ; > > y...@fridaylinux.org > > Cc: m...@redhat.com; jfrei...@redhat.com; vkapl...@redhat.com; > > jasow...@redhat.com > > Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses > > translation > > > > Hi Lei, > > > > On 10/13/2017 03:47 AM, Yao, Lei A wrote: > > > Hi, Maxime > > > > > > After this commit, vhost/virtio loopback test will fail when > > > use the CPU on socket 1. > > > Error message like following during initialize: > > > VHOST_CONFIG: vring kick idx:0 file:20 > > > VHOST_CONFIG: reallocate vq from 0 to 1 node > > > VHOST_CONFIG: reallocate dev from 0 to 1 node > > > VHOST_CONFIG: (0) failed to find avail ring address. > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK > > > VHOST_CONFIG: vring kick idx:1 file:21 > > > VHOST_CONFIG: reallocate vq from 0 to 1 node > > > VHOST_CONFIG: (0) failed to find avail ring address. > > > > > > But if use CPU on socket 0. It still can work. Could you have a check on > > > this? Thanks a lot! > > > > Thanks for reporting the issue. It seems addr pointer still points to > > the old structure after the reallocation. This patch should fix the > > issue: > > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > > index 9acac6125..2416a0061 100644 > > --- a/lib/librte_vhost/vhost_user.c > > +++ b/lib/librte_vhost/vhost_user.c > > @@ -417,6 +417,7 @@ translate_ring_addresses(struct virtio_net *dev, int > > vq_index) > > > > dev = numa_realloc(dev, vq_index); > > vq = dev->virtqueue[vq_index]; > > + addr = &vq->ring_addrs; > > > > vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev, > > vq, addr->avail_user_addr, sizeof(struct > > vring_avail)); > > > > I only have access to single-socket machines at the moment, so I cannot > > reproduce the issue. > > Can you have a try? > With your new patch, it can work on socket 1 now. Thanks. > I find another issue with this patch when I use V17.11-rc1. It will break the connection between vhost and virtio-net in VM. The link status of vhost-user port is always down. But virtio-pmd driver is still ok. My qemu version: 2.5 Guest OS: Ubuntu 16.04 Guest kernel: 4.4.0 Could you have a check on this issue? Thanks a lot! BRs Lei > > > > Thanks, > > Maxime > > > > > Following is my cmd: > > > Vhost: testpmd -n 4 -c 0x300 --socket-mem 1024,1024 --file- > > prefix=vhost \ > > > --vdev 'net_vhost0,iface=vhost-net,queues=1,client=0' -- > > > -i > > > Virtio-user: testpmd -n 4 -c 0xc0 --socket-mem 1024,1024 --no-pci -- > > file-prefix=virtio \ > > > -- > vdev=net_virtio_user0,mac=00:01:02:03:04:05,path=./vhost- > > net \ > > > -- -i --txqflags=0xf01 --disable-hw-vlan-filter > > > > > > BRs > > > Lei > > > > > >> -Original Message- > > >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Maxime > > Coquelin > > >> Sent: Thursday, October 5, 2017 4:36 PM > > >> To: dev@dpdk.org; Horton, Remy ; Bie, Tiwei > > >> ; y...@fridaylinux.org > > >> Cc: m...@redhat.com; jfrei...@redhat.com; vkapl...@redhat.com; > > >> jasow...@redhat.com; Maxime Coquelin > > > > >> Subject: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses > > >> translation > > >> > > >> This patch postpones rings addresses translations and checks, as > > >> addresses sent by the master shuld not be interpreted as long as > > >> ring is not started and
Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses translation
Hi, Maxime Add one comment: This issue with virtio-net only occur when I use CPU on socket 1. > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yao, Lei A > Sent: Monday, October 16, 2017 2:00 PM > To: 'Maxime Coquelin' ; 'dev@dpdk.org' > ; Horton, Remy ; Bie, Tiwei > ; 'y...@fridaylinux.org' > Cc: 'm...@redhat.com' ; 'jfrei...@redhat.com' > ; 'vkapl...@redhat.com' ; > 'jasow...@redhat.com' > Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses > translation > > Hi, Maxime > > > -Original Message- > > From: Yao, Lei A > > Sent: Friday, October 13, 2017 3:55 PM > > To: Maxime Coquelin ; dev@dpdk.org; > > Horton, Remy ; Bie, Tiwei ; > > y...@fridaylinux.org > > Cc: m...@redhat.com; jfrei...@redhat.com; vkapl...@redhat.com; > > jasow...@redhat.com > > Subject: RE: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses > > translation > > > > Hi, Maxime > > > > > -Original Message- > > > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > > > Sent: Friday, October 13, 2017 3:32 PM > > > To: Yao, Lei A ; dev@dpdk.org; Horton, Remy > > > ; Bie, Tiwei ; > > > y...@fridaylinux.org > > > Cc: m...@redhat.com; jfrei...@redhat.com; vkapl...@redhat.com; > > > jasow...@redhat.com > > > Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings > addresses > > > translation > > > > > > Hi Lei, > > > > > > On 10/13/2017 03:47 AM, Yao, Lei A wrote: > > > > Hi, Maxime > > > > > > > > After this commit, vhost/virtio loopback test will fail when > > > > use the CPU on socket 1. > > > > Error message like following during initialize: > > > > VHOST_CONFIG: vring kick idx:0 file:20 > > > > VHOST_CONFIG: reallocate vq from 0 to 1 node > > > > VHOST_CONFIG: reallocate dev from 0 to 1 node > > > > VHOST_CONFIG: (0) failed to find avail ring address. > > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM > > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE > > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR > > > > VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK > > > > VHOST_CONFIG: vring kick idx:1 file:21 > > > > VHOST_CONFIG: reallocate vq from 0 to 1 node > > > > VHOST_CONFIG: (0) failed to find avail ring address. > > > > > > > > But if use CPU on socket 0. It still can work. Could you have a check on > > > > this? Thanks a lot! > > > > > > Thanks for reporting the issue. It seems addr pointer still points to > > > the old structure after the reallocation. This patch should fix the > > > issue: > > > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > > > index 9acac6125..2416a0061 100644 > > > --- a/lib/librte_vhost/vhost_user.c > > > +++ b/lib/librte_vhost/vhost_user.c > > > @@ -417,6 +417,7 @@ translate_ring_addresses(struct virtio_net *dev, > int > > > vq_index) > > > > > > dev = numa_realloc(dev, vq_index); > > > vq = dev->virtqueue[vq_index]; > > > + addr = &vq->ring_addrs; > > > > > > vq->avail = (struct vring_avail > > > *)(uintptr_t)ring_addr_to_vva(dev, > > > vq, addr->avail_user_addr, sizeof(struct > > > vring_avail)); > > > > > > I only have access to single-socket machines at the moment, so I cannot > > > reproduce the issue. > > > Can you have a try? > > With your new patch, it can work on socket 1 now. Thanks. > > > I find another issue with this patch when I use V17.11-rc1. > It will break the connection between vhost and virtio-net in VM. > The link status of vhost-user port is always down. > But virtio-pmd driver is still ok. > > My qemu version: 2.5 > Guest OS: Ubuntu 16.04 > Guest kernel: 4.4.0 > > Could you have a check on this issue? Thanks a lot! > > BRs > Lei > > > > > > > Thanks, > > > Maxime > > > > > > > Following is my cmd: > > > > Vhost: testpmd -n 4 -c 0x300 --socket-mem 1024,1024 --file- > > > prefix=vhost \ > > > > --vdev 'net_vhost0,iface=vhost-net,queues=1,client=0' > > > > -- -i > > > > Virtio-user: testpmd -n 4 -c 0xc0 --socket-mem
Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses translation
Hi, Maxime > -Original Message- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Monday, October 16, 2017 6:48 PM > To: Yao, Lei A ; 'dev@dpdk.org' ; > Horton, Remy ; Bie, Tiwei ; > 'y...@fridaylinux.org' > Cc: 'm...@redhat.com' ; 'jfrei...@redhat.com' > ; 'vkapl...@redhat.com' ; > 'jasow...@redhat.com' > Subject: Re: [dpdk-dev] [PATCH v3 15/19] vhost: postpone rings addresses > translation > > > > On 10/16/2017 11:47 AM, Maxime Coquelin wrote: > > Hi Yao, > > > > On 10/16/2017 08:23 AM, Yao, Lei A wrote: > >> Hi, Maxime > >> > >> Add one comment: > >> This issue with virtio-net only occur when I use CPU on socket 1. > > > > Thanks for the report. > > I fail to reproduce for now. > > > > What is your qemu command line? > > Is it reproducible systematically when there is a NUMA reallocation > > (DPDK on socket 0, QEMU on socket 1)? > > Nevermind, I just reproduced the (an?) issue. > The issue I reproduce is not linked to NUMA reallocation, but to > messages sequencing differences between QEMU versions. > > So, I'm not 100% this is the same issue, as you mention it works fine > when using CPU socket 0. > > The patch "vhost: postpone rings addresses translation" moves rings > addresses translation at either vring kick or enable time, depending > on whether protocol features are enabled or not. This has been done > because we must not interpret ring information as long as the vring is > not fully initialized. > > While my patch works fine with recent QEMU version, it breaks with older > ones, like QEMU v2.5. The reason is that on these older versions, > VHOST_USER_SET_VRING_ENABLE is called once and before > VHOST_USER_SET_VRING_ADDR. At that time, the ring adresses aren't > available so the translation is not done. On recent QEMU versions, > we receive VHOST_USER_SET_VRING_ENABLE also after having received > the rings addresses, so it works fine. > > The below fix consists in performing the rings addresses translation > also when handling VHOST_USER_SET_VRING_ADDR if ring has already been > enabled. > > I'll post a formal patch later today or tomorrow morning after having > tested it more conscientiously. Let me know if it fixes your issue. > > Thanks, > Maxime > Thanks for your quick fix. I try your patch and it can totally work at my side for virtio-net. I tested it with Qemu 2.5~2.7, 2.10. The previous info about it can work on numa 0 is a misleading info. Because I use qemu 2.10 for some special test at that time. So it can work. BRs Lei > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > index 76c4eeca5..1f6cba4b9 100644 > --- a/lib/librte_vhost/vhost_user.c > +++ b/lib/librte_vhost/vhost_user.c > @@ -372,33 +372,6 @@ ring_addr_to_vva(struct virtio_net *dev, struct > vhost_virtqueue *vq, > return qva_to_vva(dev, ra); > } > > -/* > - * The virtio device sends us the desc, used and avail ring addresses. > - * This function then converts these to our address space. > - */ > -static int > -vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg) > -{ > - struct vhost_virtqueue *vq; > - struct vhost_vring_addr *addr = &msg->payload.addr; > - > - if (dev->mem == NULL) > - return -1; > - > - /* addr->index refers to the queue index. The txq 1, rxq is 0. */ > - vq = dev->virtqueue[msg->payload.addr.index]; > - > - /* > -* Rings addresses should not be interpreted as long as the ring > is not > -* started and enabled > -*/ > - memcpy(&vq->ring_addrs, addr, sizeof(*addr)); > - > - vring_invalidate(dev, vq); > - > - return 0; > -} > - > static struct virtio_net * > translate_ring_addresses(struct virtio_net *dev, int vq_index) > { > @@ -464,6 +437,43 @@ translate_ring_addresses(struct virtio_net *dev, > int vq_index) > } > > /* > + * The virtio device sends us the desc, used and avail ring addresses. > + * This function then converts these to our address space. > + */ > +static int > +vhost_user_set_vring_addr(struct virtio_net **pdev, VhostUserMsg *msg) > +{ > + struct vhost_virtqueue *vq; > + struct vhost_vring_addr *addr = &msg->payload.addr; > + struct virtio_net *dev = *pdev; > + > + if (dev->mem == NULL) > + return -1; > + > + /* addr->index refers to the queue index. The txq 1, rxq is 0. */ > + vq = dev->virtqueue[ms
Re: [dpdk-dev] [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped
L << VIRTIO_F_IOMMU_PLATFORM)) > + vhost_user_iotlb_rd_lock(vq); > + > + if (unlikely(vq->access_ok == 0)) > + if (unlikely(vring_translate(dev, vq) < 0)) > + goto out; > + > if (unlikely(dev->dequeue_zero_copy)) { > struct zcopy_mbuf *zmbuf, *next; > int nr_updated = 0; > @@ -1262,10 +1280,6 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, > > /* Prefetch descriptor index. */ > rte_prefetch0(&vq->desc[desc_indexes[0]]); > - > - if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > - vhost_user_iotlb_rd_lock(vq); > - > for (i = 0; i < count; i++) { > struct vring_desc *desc; > uint16_t sz, idx; > @@ -1329,9 +1343,6 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, > TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next); > } > } > - if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > - vhost_user_iotlb_rd_unlock(vq); > - > vq->last_avail_idx += i; > > if (likely(dev->dequeue_zero_copy == 0)) { > @@ -1341,6 +1352,9 @@ rte_vhost_dequeue_burst(int vid, uint16_t > queue_id, > } > > out: > + if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > + vhost_user_iotlb_rd_unlock(vq); > + > if (unlikely(rarp_mbuf != NULL)) { > /* >* Inject it to the head of "pkts" array, so that switch's mac > -- > 2.13.6 Hi, Maxime I met one issue with your patch set during the v17.11 test. The test scenario is following, 1. Bind one NIC, use test-pmd set vhost-user with 2 queue usertools/dpdk-devbind.py --bind=igb_uio :05:00.0 ./x86_64-native-linuxapp-gcc/app/testpmd -c 0xe -n 4 --socket-mem 1024,1024 \ --vdev 'net_vhost0,iface=vhost-net,queues=2' - -i --rxq=2 --txq=2 --nb-cores=2 --rss-ip 2. Launch qemu with virtio device which has 2 queue 3. In VM, launch testpmd with virtio-pmd using only 1 queue. x86_64-native-linuxapp-gcc/app/testpmd -c 0x07 -n 3 - -i --txqflags=0xf01 \ --rxq=1 --txq=1 --rss-ip --nb-cores=1 First, commit 09927b5249694bad1c094d3068124673722e6b8f vhost: translate ring addresses when IOMMU enabled The patch causes no traffic in PVP test. but link status is still up in vhost-user. Second, eefac9536a901a1f0bb52aa3b6fec8f375f09190 vhost: postpone device creation until rings are mapped The patch causes link status "down" in vhost-user. Could you have a check at your side? Thanks. BRs Lei
Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over memcpy
Hi, Thomas > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Thomas Monjalon > Sent: Thursday, November 2, 2017 6:45 PM > To: Wang, Zhihong ; Li, Xiaoyun > > Cc: dev@dpdk.org; Richardson, Bruce ; > Ananyev, Konstantin ; Lu, Wenzhuo > ; Zhang, Helin ; > ophi...@mellanox.com > Subject: Re: [dpdk-dev] [PATCH v8 1/3] eal/x86: run-time dispatch over > memcpy > > 02/11/2017 11:22, Wang, Zhihong: > > > I don't know what is creating this drop exactly. > > > When doing different tests on different environments, we do not see > this > > > drop. > > > If nobody else can see such issue, I guess we can ignore it. > > > > Hi Thomas, Xiaoyun, > > > > With this patch (commit 84cc318424d49372dd2a5fbf3cf84426bf95acce) I see > > more than 20% performance drop in vhost loopback test with testpmd > > macswap for 256 bytes packets, which means it impacts actual vSwitching > > performance. > > > > Suggest we fix it or revert it for this release. > > I think we need more numbers to take a decision. > What is the benefit of this patch? In which use-cases? > What are the drawbacks? In which use-cases? > > Please, it is a call to test performance with and without this patch > in more environments (CPU, packet size, applications). Following is the performance drop we observe in vhost/virtio loopback performance with and without this patch Test application: testpmd CPU info: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz OS: Ubuntu 16.04 Mergebale Path packet size Performance Drop 64 -1.30% 128 0.81% 158 -19.17% 188 -19.18% 218 -16.29% 230 -16.57% 256 -16.77% 280 -3.07% 300 -3.22% 380 -2.44% 420 -1.65% 512 -0.99% 10240.00% 1518-0.68% Vector Path packet size Performance Drop 64 3.30% 128 7.18% 256 -12.77% 512 -0.98% 10240.27% 15180.68%
Re: [dpdk-dev] [PATCH v2 0/2] Fix 2 bugs of i40e VF interrupt found in l3fwd-power
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Wei Dai > Sent: Friday, November 3, 2017 3:14 PM > To: Wu, Jingjing ; Xing, Beilei > ; Liang, Cunming > Cc: dev@dpdk.org; Dai, Wei > Subject: [dpdk-dev] [PATCH v2 0/2] Fix 2 bugs of i40e VF interrupt found in > l3fwd-power > > These 2 bugs can be observed from example/l3fwd-power run with i40e VF > bound to VFIO-PCI. The test steps are as follows: > 1. Disable LSC interrupt by clearing port_conf.intr_conf.lsc=0, as i40e >VF doesn't support LSC interrupt to avoid rte_eth_dev_configure() failure. > 2. Create a VF from i40e host PF. Let PF run with kernel driver and bind >its VF to VFIO-PCI > 3. Run l3fwd-power like: l3fwd-power -l 18-19 -- -p 0x1 --config='(0,0,19)' > > Then, the following error message appears like: > EAL: Error enabling MSI-X interrupts for fd 18 > This error is from rte_intr_enable( )/vfio_enable_msix( ) when enabling > Rx queue interrupt. > Same as the patch 06938770186a ("net/ixgbe: fix VFIO interrupt mapping in > VF"), > to change VFIO MSI-X interrupts mapping, previous mapping should be > cleared > first to avoid above error. > > After fixing above VFIO-PCI MSI-X interrupt mapping. There is still the > following > 2nd bug: l3fwd-power still can not be waked up by incoming packets. > > Same as the patch ca9d6597184b ("net/ixgbe: fix Rx queue interrupt mapping > in VF"), > the interrupt vector of Rx queues should be mapped to vector 1 to fix above > 2nd bug. > > These patches have passed test with l3fwd-power using i40e VF bound to > VFIO-PCI. > They also passed the test with testpmd rxonly and txonly mode with igb_uio > and VFIO-PCI. > > Signed-off-by: Wei Dai Tested-by: Lei Yao > > --- > v2: only remap VFIO interrupt in i40evf_dev_start( ) > > > Wei Dai (2): > net/i40e: fix VFIO interrupt mapping in VF > net/i40e: fix Rx queue interrupt mapping in VF > > drivers/net/i40e/i40e_ethdev_vf.c | 18 ++ > 1 file changed, 14 insertions(+), 4 deletions(-) > > -- > 2.7.4 Following test case is tested and passed with this patch. L3fwd-power with 1 i40e PF (vfio-pci) can receive interrupt L3fwd-power with 1 i40e VF (vfio-pci) can receive interrupt L3fwd-power with 2 i40e VF (vfio-pci) can receive interrupt OS: Ubuntu 16.04 CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz NIC: Ethernet Controller X710
Re: [dpdk-dev] [PATCH] vhost: postpone ring addresses translations at kick time only
Hi, Maxime > -Original Message- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Friday, November 3, 2017 11:57 PM > To: dev@dpdk.org; y...@fridaylinux.org; Yao, Lei A > Cc: m...@redhat.com > Subject: Re: [PATCH] vhost: postpone ring addresses translations at kick time > only > > Hi Lei, > > On 11/03/2017 04:52 PM, Maxime Coquelin wrote: > > If multiple queue pairs are created but all are not used, the > > device is never started, as unused queues aren't enabled and > > their ring addresses aren't translated. The device is changed > > to running state when all rings addresses are translated. > > > > This patch fixes this by postponning rings addresses translation > > at kick time unconditionnaly, VHOST_USER_F_PROTOCOL_FEATURES > > being negotiated or not. > > > > Reported-by: Lei Yao > > Signed-off-by: Maxime Coquelin > > --- > > lib/librte_vhost/vhost_user.c | 33 - > > 1 file changed, 8 insertions(+), 25 deletions(-) > > Could you confirm the patch fixes the issue on your side? > > I tested below cases with and without IOMMU: > - Host DPDK queues = 1 / QEMU queues = 1 / Guest DPDK queues = 1 > - Host DPDK queues = 2 / QEMU queues = 2 / Guest DPDK queues = 1 > - Host DPDK queues = 2 / QEMU queues = 2 / Guest DPDK queues = 2 > > Thanks, > Maxime Thanks for your patch. I test my test cases with your patch based on v17.11-rc2, It can fix my issue here. BRs Lei
Re: [dpdk-dev] [PATCH v2 06/10] net/virtio: fix queue setup consistency
Hi, Olivier This is Lei from DPDK validation team in Intel. During our DPDK 18.02-rc1 test, I find the following patch will cause one serious issue with virtio vector path: the traffic can't resume after stop/start the virtio device. The step like following: 1. Launch vhost-user port using testpmd at Host 2. Launch VM with virtio device, mergeable is off 3. Bind the virtio device to pmd driver, launch testpmd, let the tx/rx use vector path virtio_xmit_pkts_simple virtio_recv_pkts_vec 4. Send traffic to virtio device from vhost side, then stop the virtio device 5. Start the virtio device again After step 5, the traffic can't resume. Could you help check this and give a fix? This issue will impact the virtio pmd user experience heavily. By the way, this patch is already included into V17.11. Looks like we need give a patch to this LTS version. Thanks a lot! BRs Lei > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier Matz > Sent: Thursday, September 7, 2017 8:14 PM > To: dev@dpdk.org; y...@fridaylinux.org; maxime.coque...@redhat.com > Cc: step...@networkplumber.org; sta...@dpdk.org > Subject: [dpdk-dev] [PATCH v2 06/10] net/virtio: fix queue setup consistency > > In rx/tx queue setup functions, some code is executed only if > use_simple_rxtx == 1. The value of this variable can change depending on > the offload flags or sse support. If Rx queue setup is called before Tx > queue setup, it can result in an invalid configuration: > > - dev_configure is called: use_simple_rxtx is initialized to 0 > - rx queue setup is called: queues are initialized without simple path > support > - tx queue setup is called: use_simple_rxtx switch to 1, and simple > Rx/Tx handlers are selected > > Fix this by postponing a part of Rx/Tx queue initialization in > dev_start(), as it was the case in the initial implementation. > > Fixes: 48cec290a3d2 ("net/virtio: move queue configure code to proper > place") > Cc: sta...@dpdk.org > > Signed-off-by: Olivier Matz > --- > drivers/net/virtio/virtio_ethdev.c | 13 + > drivers/net/virtio/virtio_ethdev.h | 6 ++ > drivers/net/virtio/virtio_rxtx.c | 40 ++- > --- > 3 files changed, 51 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/virtio/virtio_ethdev.c > b/drivers/net/virtio/virtio_ethdev.c > index 8eee3ff80..c7888f103 100644 > --- a/drivers/net/virtio/virtio_ethdev.c > +++ b/drivers/net/virtio/virtio_ethdev.c > @@ -1737,6 +1737,19 @@ virtio_dev_start(struct rte_eth_dev *dev) > struct virtnet_rx *rxvq; > struct virtnet_tx *txvq __rte_unused; > struct virtio_hw *hw = dev->data->dev_private; > + int ret; > + > + /* Finish the initialization of the queues */ > + for (i = 0; i < dev->data->nb_rx_queues; i++) { > + ret = virtio_dev_rx_queue_setup_finish(dev, i); > + if (ret < 0) > + return ret; > + } > + for (i = 0; i < dev->data->nb_tx_queues; i++) { > + ret = virtio_dev_tx_queue_setup_finish(dev, i); > + if (ret < 0) > + return ret; > + } > > /* check if lsc interrupt feature is enabled */ > if (dev->data->dev_conf.intr_conf.lsc) { > diff --git a/drivers/net/virtio/virtio_ethdev.h > b/drivers/net/virtio/virtio_ethdev.h > index c3413c6d9..2039bc547 100644 > --- a/drivers/net/virtio/virtio_ethdev.h > +++ b/drivers/net/virtio/virtio_ethdev.h > @@ -92,10 +92,16 @@ int virtio_dev_rx_queue_setup(struct rte_eth_dev > *dev, uint16_t rx_queue_id, > const struct rte_eth_rxconf *rx_conf, > struct rte_mempool *mb_pool); > > +int virtio_dev_rx_queue_setup_finish(struct rte_eth_dev *dev, > + uint16_t rx_queue_id); > + > int virtio_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t > tx_queue_id, > uint16_t nb_tx_desc, unsigned int socket_id, > const struct rte_eth_txconf *tx_conf); > > +int virtio_dev_tx_queue_setup_finish(struct rte_eth_dev *dev, > + uint16_t tx_queue_id); > + > uint16_t virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, > uint16_t nb_pkts); > > diff --git a/drivers/net/virtio/virtio_rxtx.c > b/drivers/net/virtio/virtio_rxtx.c > index e30377c51..a32e3229f 100644 > --- a/drivers/net/virtio/virtio_rxtx.c > +++ b/drivers/net/virtio/virtio_rxtx.c > @@ -421,9 +421,6 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev, > struct virtio_hw *hw = dev->data->dev_private; > struct virtqueue *vq = hw->vqs[vtpci
Re: [dpdk-dev] [PATCH 1/2] virtio: fix resuming traffic with rx vector path
> -Original Message- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Friday, February 9, 2018 10:27 PM > To: Bie, Tiwei ; y...@fridaylinux.org; Yigit, Ferruh > ; vict...@redhat.com > Cc: dev@dpdk.org; sta...@dpdk.org; Wang, Zhihong > ; Xu, Qian Q ; Yao, Lei A > ; Maxime Coquelin > Subject: [PATCH 1/2] virtio: fix resuming traffic with rx vector path > > This patch fixes traffic resuming issue seen when using > Rx vector path. > > Fixes: efc83a1e7fc3 ("net/virtio: fix queue setup consistency") > > Signed-off-by: Tiwei Bie > Signed-off-by: Maxime Coquelin Tested-by: Lei Yao This patch has been tested by regression test suite. It can fix the traffic resume issue with vector path. No performance drop during PVP test: Following test are also checked and passed: Vhost/virtio multi queue Virtio-user Virtio-user as exception path Vhost/virtio reconnect My server info: OS: Ubuntu 16.04 Kernel: 4.4.0-110 CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz BR Lei > --- > drivers/net/virtio/virtio_rxtx.c| 34 > ++--- > drivers/net/virtio/virtio_rxtx_simple.c | 2 +- > drivers/net/virtio/virtio_rxtx_simple.h | 2 +- > 3 files changed, 21 insertions(+), 17 deletions(-) > > diff --git a/drivers/net/virtio/virtio_rxtx.c > b/drivers/net/virtio/virtio_rxtx.c > index 854af399e..505283edd 100644 > --- a/drivers/net/virtio/virtio_rxtx.c > +++ b/drivers/net/virtio/virtio_rxtx.c > @@ -30,6 +30,7 @@ > #include "virtio_pci.h" > #include "virtqueue.h" > #include "virtio_rxtx.h" > +#include "virtio_rxtx_simple.h" > > #ifdef RTE_LIBRTE_VIRTIO_DEBUG_DUMP > #define VIRTIO_DUMP_PACKET(m, len) rte_pktmbuf_dump(stdout, m, len) > @@ -446,25 +447,28 @@ virtio_dev_rx_queue_setup_finish(struct > rte_eth_dev *dev, uint16_t queue_idx) > &rxvq->fake_mbuf; > } > > - while (!virtqueue_full(vq)) { > - m = rte_mbuf_raw_alloc(rxvq->mpool); > - if (m == NULL) > - break; > + if (hw->use_simple_rx) { > + while (vq->vq_free_cnt >= > RTE_VIRTIO_VPMD_RX_REARM_THRESH) { > + virtio_rxq_rearm_vec(rxvq); > + nbufs += RTE_VIRTIO_VPMD_RX_REARM_THRESH; > + } > + } else { > + while (!virtqueue_full(vq)) { > + m = rte_mbuf_raw_alloc(rxvq->mpool); > + if (m == NULL) > + break; > > - /* Enqueue allocated buffers */ > - if (hw->use_simple_rx) > - error = virtqueue_enqueue_recv_refill_simple(vq, > m); > - else > + /* Enqueue allocated buffers */ > error = virtqueue_enqueue_recv_refill(vq, m); > - > - if (error) { > - rte_pktmbuf_free(m); > - break; > + if (error) { > + rte_pktmbuf_free(m); > + break; > + } > + nbufs++; > } > - nbufs++; > - } > > - vq_update_avail_idx(vq); > + vq_update_avail_idx(vq); > + } > > PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs); > > diff --git a/drivers/net/virtio/virtio_rxtx_simple.c > b/drivers/net/virtio/virtio_rxtx_simple.c > index 7247a0822..0a79d1d5b 100644 > --- a/drivers/net/virtio/virtio_rxtx_simple.c > +++ b/drivers/net/virtio/virtio_rxtx_simple.c > @@ -77,7 +77,7 @@ virtio_xmit_pkts_simple(void *tx_queue, struct > rte_mbuf **tx_pkts, > rte_compiler_barrier(); > > if (nb_used >= VIRTIO_TX_FREE_THRESH) > - virtio_xmit_cleanup(vq); > + virtio_xmit_cleanup_simple(vq); > > nb_commit = nb_pkts = RTE_MIN((vq->vq_free_cnt >> 1), nb_pkts); > desc_idx = (uint16_t)(vq->vq_avail_idx & desc_idx_max); > diff --git a/drivers/net/virtio/virtio_rxtx_simple.h > b/drivers/net/virtio/virtio_rxtx_simple.h > index 2d8e6b14a..303904d64 100644 > --- a/drivers/net/virtio/virtio_rxtx_simple.h > +++ b/drivers/net/virtio/virtio_rxtx_simple.h > @@ -60,7 +60,7 @@ virtio_rxq_rearm_vec(struct virtnet_rx *rxvq) > #define VIRTIO_TX_FREE_NR 32 > /* TODO: vq->tx_free_cnt could mean num of free slots so we could avoid > shift */ > static inline void > -virtio_xmit_cleanup(struct virtqueue *vq) > +virtio_xmit_cleanup_simple(struct virtqueue *vq) > { > uint16_t i, desc_idx; > uint32_t nb_free = 0; > -- > 2.14.3
Re: [dpdk-dev] [PATCH] vhost: support UDP Fragmentation Offload
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jiayu Hu > Sent: Tuesday, November 21, 2017 2:57 PM > To: dev@dpdk.org > Cc: y...@fridaylinux.org; Tan, Jianfeng ; Hu, Jiayu > > Subject: [dpdk-dev] [PATCH] vhost: support UDP Fragmentation Offload > > In virtio, UDP Fragmentation Offload (UFO) includes two parts: host UFO > and guest UFO. Guest UFO means the frontend can receive large UDP > packets, > and host UFO means the backend can receive large UDP packets. This patch > supports host UFO and guest UFO for vhost-user. > > Signed-off-by: Jiayu Hu Tested-by: Lei Yao This patch has been tested on my server, with guest_ufo=on,host_ufo=on are added to the qemu cmdlind, using vhost-user as backend, vm can send and receive big UDP packets. > --- > lib/librte_mbuf/rte_mbuf.h| 7 +++ > lib/librte_vhost/vhost.h | 2 ++ > lib/librte_vhost/virtio_net.c | 10 ++ > 3 files changed, 19 insertions(+) > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > index ce8a05d..3d8cfc9 100644 > --- a/lib/librte_mbuf/rte_mbuf.h > +++ b/lib/librte_mbuf/rte_mbuf.h > @@ -209,6 +209,13 @@ extern "C" { > /* add new TX flags here */ > > /** > + * UDP Fragmentation Offload flag. This flag is used for enabling UDP > + * fragmentation in SW or in HW. When use UFO, mbuf->tso_segsz is used > + * to store the MSS of UDP fragments. > + */ > +#define PKT_TX_UDP_SEG (1ULL << 42) > + > +/** > * Request security offload processing on the TX packet. > */ > #define PKT_TX_SEC_OFFLOAD (1ULL << 43) > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index 1cc81c1..fc109ef 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -206,10 +206,12 @@ struct vhost_msg { > (1ULL << > VHOST_USER_F_PROTOCOL_FEATURES) | \ > (1ULL << VIRTIO_NET_F_HOST_TSO4) | \ > (1ULL << VIRTIO_NET_F_HOST_TSO6) | \ > + (1ULL << VIRTIO_NET_F_HOST_UFO) | \ > (1ULL << VIRTIO_NET_F_CSUM)| \ > (1ULL << VIRTIO_NET_F_GUEST_CSUM) | \ > (1ULL << VIRTIO_NET_F_GUEST_TSO4) | \ > (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \ > + (1ULL << VIRTIO_NET_F_GUEST_UFO) | \ > (1ULL << VIRTIO_RING_F_INDIRECT_DESC) | > \ > (1ULL << VIRTIO_NET_F_MTU) | \ > (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c > index 6fee16e..3a3a0ad 100644 > --- a/lib/librte_vhost/virtio_net.c > +++ b/lib/librte_vhost/virtio_net.c > @@ -188,6 +188,11 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, > struct virtio_net_hdr *net_hdr) > net_hdr->gso_size = m_buf->tso_segsz; > net_hdr->hdr_len = m_buf->l2_len + m_buf->l3_len > + m_buf->l4_len; > + } else if (m_buf->ol_flags & PKT_TX_UDP_SEG) { > + net_hdr->gso_type = VIRTIO_NET_HDR_GSO_UDP; > + net_hdr->gso_size = m_buf->tso_segsz; > + net_hdr->hdr_len = m_buf->l2_len + m_buf->l3_len + > + m_buf->l4_len; > } else { > ASSIGN_UNLESS_EQUAL(net_hdr->gso_type, 0); > ASSIGN_UNLESS_EQUAL(net_hdr->gso_size, 0); > @@ -834,6 +839,11 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, > struct rte_mbuf *m) > m->tso_segsz = hdr->gso_size; > m->l4_len = (tcp_hdr->data_off & 0xf0) >> 2; > break; > + case VIRTIO_NET_HDR_GSO_UDP: > + m->ol_flags |= PKT_TX_UDP_SEG; > + m->tso_segsz = hdr->gso_size; > + m->l4_len = sizeof(struct udp_hdr); > + break; > default: > RTE_LOG(WARNING, VHOST_DATA, > "unsupported gso type %u.\n", hdr- > >gso_type); > -- > 2.7.4
Re: [dpdk-dev] [PATCH] vhost: support Generic Segmentation Offload
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jiayu Hu > Sent: Tuesday, November 28, 2017 1:29 PM > To: dev@dpdk.org > Cc: y...@fridaylinux.org; Tan, Jianfeng ; Hu, Jiayu > > Subject: [dpdk-dev] [PATCH] vhost: support Generic Segmentation Offload > > In virtio, Generic Segmentation Offload (GSO) is the feature for the > backend, which means the backend can receive packets with any GSO > type. > > Virtio-net enables the GSO feature by default, and vhost-net supports it. > To make live migration from vhost-net to vhost-user possible, this patch > enables GSO for vhost-user. > > Signed-off-by: Jiayu Hu Tested-by: Lei Yao This patch has been tested on my server, after add csum=on, gso=on to qemu cmdline, Following offload are active in vm: udp-fragmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: on tx-tcp6-segmentation: on > --- > lib/librte_vhost/vhost.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index 1cc81c1..04f54cb 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -204,6 +204,7 @@ struct vhost_msg { > (1ULL << VIRTIO_F_VERSION_1) | \ > (1ULL << VHOST_F_LOG_ALL) | \ > (1ULL << > VHOST_USER_F_PROTOCOL_FEATURES) | \ > + (1ULL << VIRTIO_NET_F_GSO) | \ > (1ULL << VIRTIO_NET_F_HOST_TSO4) | \ > (1ULL << VIRTIO_NET_F_HOST_TSO6) | \ > (1ULL << VIRTIO_NET_F_CSUM)| \ > -- > 2.7.4
Re: [dpdk-dev] [PATCH v4 1/2] gro: code cleanup
> -Original Message- > From: Hu, Jiayu > Sent: Friday, January 5, 2018 2:13 PM > To: dev@dpdk.org > Cc: Richardson, Bruce ; Chen, Junjie J > ; Tan, Jianfeng ; > step...@networkplumber.org; Yigit, Ferruh ; > Ananyev, Konstantin ; Yao, Lei A > ; Hu, Jiayu > Subject: [PATCH v4 1/2] gro: code cleanup > > - Remove needless check and variants > - For better understanding, update the programmer guide and rename > internal functions and variants > - For supporting tunneled gro, move common internal functions from > gro_tcp4.c to gro_tcp4.h > - Comply RFC 6864 to process the IPv4 ID field > > Signed-off-by: Jiayu Hu > Reviewed-by: Junjie Chen Tested-by: Lei Yao I have tested this patch with following traffic follow: NIC1(In kernel)-->NIC2(pmd, GRO on)-->vhost-user->virtio-net(in VM) The Iperf test with 1 stream show that GRO VxLAN can improve the performance from 6 Gbps(GRO off) to 16 Gbps(GRO on). > --- > .../prog_guide/generic_receive_offload_lib.rst | 246 --- > doc/guides/prog_guide/img/gro-key-algorithm.svg| 223 > ++ > lib/librte_gro/gro_tcp4.c | 339 > +++-- > lib/librte_gro/gro_tcp4.h | 253 ++- > lib/librte_gro/rte_gro.c | 102 +++ > lib/librte_gro/rte_gro.h | 92 +++--- > 6 files changed, 750 insertions(+), 505 deletions(-) > create mode 100644 doc/guides/prog_guide/img/gro-key-algorithm.svg > > diff --git a/doc/guides/prog_guide/generic_receive_offload_lib.rst > b/doc/guides/prog_guide/generic_receive_offload_lib.rst > index 22e50ec..c2d7a41 100644 > --- a/doc/guides/prog_guide/generic_receive_offload_lib.rst > +++ b/doc/guides/prog_guide/generic_receive_offload_lib.rst > @@ -32,128 +32,162 @@ Generic Receive Offload Library > === > > Generic Receive Offload (GRO) is a widely used SW-based offloading > -technique to reduce per-packet processing overhead. It gains performance > -by reassembling small packets into large ones. To enable more flexibility > -to applications, DPDK implements GRO as a standalone library. Applications > -explicitly use the GRO library to merge small packets into large ones. > - > -The GRO library assumes all input packets have correct checksums. In > -addition, the GRO library doesn't re-calculate checksums for merged > -packets. If input packets are IP fragmented, the GRO library assumes > -they are complete packets (i.e. with L4 headers). > - > -Currently, the GRO library implements TCP/IPv4 packet reassembly. > - > -Reassembly Modes > - > - > -The GRO library provides two reassembly modes: lightweight and > -heavyweight mode. If applications want to merge packets in a simple way, > -they can use the lightweight mode API. If applications want more > -fine-grained controls, they can choose the heavyweight mode API. > - > -Lightweight Mode > - > - > -The ``rte_gro_reassemble_burst()`` function is used for reassembly in > -lightweight mode. It tries to merge N input packets at a time, where > -N should be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``. > - > -In each invocation, ``rte_gro_reassemble_burst()`` allocates temporary > -reassembly tables for the desired GRO types. Note that the reassembly > -table is a table structure used to reassemble packets and different GRO > -types (e.g. TCP/IPv4 GRO and TCP/IPv6 GRO) have different reassembly > table > -structures. The ``rte_gro_reassemble_burst()`` function uses the > reassembly > -tables to merge the N input packets. > - > -For applications, performing GRO in lightweight mode is simple. They > -just need to invoke ``rte_gro_reassemble_burst()``. Applications can get > -GROed packets as soon as ``rte_gro_reassemble_burst()`` returns. > - > -Heavyweight Mode > - > - > -The ``rte_gro_reassemble()`` function is used for reassembly in > heavyweight > -mode. Compared with the lightweight mode, performing GRO in > heavyweight mode > -is relatively complicated. > - > -Before performing GRO, applications need to create a GRO context object > -by calling ``rte_gro_ctx_create()``. A GRO context object holds the > -reassembly tables of desired GRO types. Note that all update/lookup > -operations on the context object are not thread safe. So if different > -processes or threads want to access the same context object > simultaneously, > -some external syncing mechanisms must be used. > - > -Once the GRO context is created, applications can then use the > -``rte_gro_reassemble()`` function to merge packets. In each invocation, > -``rte_gro_reassemble()`` t
Re: [dpdk-dev] [PATCH v7] vhost: support virtqueue interrupt/notification suppression
> -Original Message- > From: Chen, Junjie J > Sent: Tuesday, January 9, 2018 7:04 PM > To: y...@fridaylinux.org; maxime.coque...@redhat.com; Wang, Xiao W > ; Bie, Tiwei ; Yao, Lei A > > Cc: dev@dpdk.org; Chen, Junjie J > Subject: [PATCH v7] vhost: support virtqueue interrupt/notification > suppression > > The driver can suppress interrupt when VIRTIO_F_EVENT_IDX feature bit is > negotiated. The driver set vring flags to 0, and MAY use used_event in > available ring to advise device interrupt util reach an index specified > by used_event. The device ignore the lower bit of vring flags, and send > an interrupt when index reach used_event. > > The device can suppress notification in a manner analogous to the ways > driver suppress interrupt. The device manipulates flags or avail_event in > the used ring in the same way the driver manipulates flags or used_event in > available ring. > > Signed-off-by: Junjie Chen Tested-by: Lei Yao VM2VM Iperf test has been executed with virtio-net driver after apply this patch. No performance drop. CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz Host OS: Ubuntu 16.04 Guest OS: Ubuntu 16.04 Kernel: 4.4.0 > --- > v7: > Add vhost_need_event definition and update code for next virtio. > > v6: > Use volatile qualifier to access avail event idx. > > v5: > Remove updating avail event index in backend. > > v2-v4: > Use definition of VIRTIO_F_EVENT_IDX from kernel. > > lib/librte_vhost/vhost.c | 2 +- > lib/librte_vhost/vhost.h | 44 > ++- > lib/librte_vhost/virtio_net.c | 6 +++--- > 3 files changed, 43 insertions(+), 9 deletions(-) > > diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c > index 6f7ef7f..400caa0 100644 > --- a/lib/librte_vhost/vhost.c > +++ b/lib/librte_vhost/vhost.c > @@ -538,7 +538,7 @@ rte_vhost_vring_call(int vid, uint16_t vring_idx) > if (!vq) > return -1; > > - vhost_vring_call(vq); > + vhost_vring_call(dev, vq); > return 0; > } > > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h > index 1d9366e..ec79991 100644 > --- a/lib/librte_vhost/vhost.h > +++ b/lib/librte_vhost/vhost.h > @@ -103,6 +103,8 @@ struct vhost_virtqueue { > > uint16_tlast_avail_idx; > uint16_tlast_used_idx; > + /* Last used index we notify to front end. */ > + uint16_tsignalled_used; > #define VIRTIO_INVALID_EVENTFD (-1) > #define VIRTIO_UNINITIALIZED_EVENTFD (-2) > > @@ -214,6 +216,7 @@ struct vhost_msg { > (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \ > (1ULL << VIRTIO_NET_F_GUEST_UFO) | \ > (1ULL << VIRTIO_RING_F_INDIRECT_DESC) | > \ > + (1ULL << VIRTIO_RING_F_EVENT_IDX) | \ > (1ULL << VIRTIO_NET_F_MTU) | \ > (1ULL << VIRTIO_F_IOMMU_PLATFORM)) > > @@ -399,16 +402,47 @@ vhost_iova_to_vva(struct virtio_net *dev, struct > vhost_virtqueue *vq, > return __vhost_iova_to_vva(dev, vq, iova, size, perm); > } > > +#define vhost_used_event(vr) \ > + (*(volatile uint16_t*)&(vr)->avail->ring[(vr)->size]) > + > +/* > + * The following is used with VIRTIO_RING_F_EVENT_IDX. > + * Assuming a given event_idx value from the other size, if we have > + * just incremented index from old to new_idx, should we trigger an > + * event? > + */ > +static __rte_always_inline int > +vhost_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old) > +{ > + return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - > old); > +} > + > static __rte_always_inline void > -vhost_vring_call(struct vhost_virtqueue *vq) > +vhost_vring_call(struct virtio_net *dev, struct vhost_virtqueue *vq) > { > /* Flush used->idx update before we read avail->flags. */ > rte_mb(); > > - /* Kick the guest if necessary. */ > - if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT) > - && (vq->callfd >= 0)) > - eventfd_write(vq->callfd, (eventfd_t)1); > + /* Don't kick guest if we don't reach index specified by guest. */ > + if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) { > + uint16_t old = vq->signalled_used; > + uint16_t new = vq->last_used_idx; > + > + LOG_DEBUG(VHOST_DATA, "%s: used_event_idx=%d, > old=%d, new=%d\n", > +
[dpdk-dev] [PATCH] i40e: enable i40e pmd on ARM platform
Hi, Jianbo I have tested you patch on my X86 platform, the single core performance for Non-vector PMD will have about 1Mpps drop Non-vector PMD single core performance with patch : ~33.9 Mpps Non-vector PMD single core performance without patch: ~35.1 Mpps Is there any way to avoid such performance drop on X86? Thanks. BRs Lei -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Jianbo Liu Sent: Tuesday, August 2, 2016 2:58 PM To: dev at dpdk.org; Zhang, Helin ; Wu, Jingjing Cc: Jianbo Liu Subject: [dpdk-dev] [PATCH] i40e: enable i40e pmd on ARM platform And add read memory barrier to avoid status inconsistency between two RX descriptors readings. Signed-off-by: Jianbo Liu --- config/defconfig_arm64-armv8a-linuxapp-gcc | 2 +- doc/guides/nics/overview.rst | 2 +- drivers/net/i40e/i40e_rxtx.c | 2 ++ 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc index 1a17126..08f282b 100644 --- a/config/defconfig_arm64-armv8a-linuxapp-gcc +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc @@ -46,6 +46,6 @@ CONFIG_RTE_EAL_IGB_UIO=n CONFIG_RTE_LIBRTE_IVSHMEM=n CONFIG_RTE_LIBRTE_FM10K_PMD=n -CONFIG_RTE_LIBRTE_I40E_PMD=n +CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n CONFIG_RTE_SCHED_VECTOR=n diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst index 6abbae6..5175591 100644 --- a/doc/guides/nics/overview.rst +++ b/doc/guides/nics/overview.rst @@ -138,7 +138,7 @@ Most of these differences are summarized below. Linux VFIO Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Other kdrv Y Y Y ARMv7 Y Y Y - ARMv8 Y Y Y Y Y Y Y Y + ARMv8 Y Y Y Y Y Y Y Y Y Power8 Y Y Y TILE-Gx Y x86-32 Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 554d167..4004b8e 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -994,6 +994,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq) I40E_RXD_QW1_STATUS_SHIFT; } + rte_rmb(); + /* Compute how many status bits were set */ for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++) nb_dd += s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT); -- 2.4.11
Re: [dpdk-dev] [PATCH] examples/power: fix ack for enable/disable turbo
> -Original Message- > From: Hunt, David > Sent: Tuesday, February 11, 2020 6:50 PM > To: dev@dpdk.org; Hunt, David > Cc: Yao, Lei A ; sta...@dpdk.org > Subject: [PATCH] examples/power: fix ack for enable/disable turbo > > When a VM sends a command through virtio-serial to enable/disable turbo, it > is successfully enabled or disabled, yet the response to the VM is NACK. This > is because all the library frequency change APIs return > 1 for success (change in frequency), 0 for success (no change in > frequency) and -1 for failure. However the turbo enable/disable APIs just > return 0 for success and -1 for failure. > > Fix the handling of the return code to treat ">= 0" as success, and send an > ACK. Only send NACK when < 0 (failure). > > Fixes: 0de94bcac7fc ("examples/vm_power: send confirmation cmd to > guest") > Signed-off-by: David Hunt Acked-by: Lei Yao > --- > examples/vm_power_manager/channel_monitor.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/examples/vm_power_manager/channel_monitor.c > b/examples/vm_power_manager/channel_monitor.c > index 090c2a98b..1d00a6cf6 100644 > --- a/examples/vm_power_manager/channel_monitor.c > +++ b/examples/vm_power_manager/channel_monitor.c > @@ -868,7 +868,7 @@ process_request(struct channel_packet *pkt, struct > channel_info *chan_info) > if (valid_unit) { > ret = send_ack_for_received_cmd(pkt, > chan_info, > - scale_res > 0 ? > + scale_res >= 0 ? > CPU_POWER_CMD_ACK : > CPU_POWER_CMD_NACK); > if (ret < 0) > -- > 2.17.1
Re: [dpdk-dev] [PATCH v4 1/3] vfio: revert change that does intr eventfd setup at probe
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Nithin Dabilpuram > Sent: Thursday, July 18, 2019 10:36 PM > To: Hyong Youb Kim ; David Marchand > ; Thomas Monjalon > ; Yigit, Ferruh ; Burakov, > Anatoly > Cc: jer...@marvell.com; John Daley ; Shahed Shaikh > ; dev@dpdk.org; Nithin Dabilpuram > > Subject: [dpdk-dev] [PATCH v4 1/3] vfio: revert change that does intr > eventfd setup at probe > > This reverts commit 89aac60e0be9ed95a87b16e3595f102f9faaffb4. > "vfio: fix interrupts race condition" > > The above mentioned commit moves the interrupt's eventfd setup > to probe time but only enables one interrupt for all types of > interrupt handles i.e VFIO_MSI, VFIO_LEGACY, VFIO_MSIX, UIO. > It works fine with default case but breaks below cases specifically > for MSIX based interrupt handles. > > * Applications like l3fwd-power that request rxq interrupts > while ethdev setup. > * Drivers that need > 1 MSIx interrupts to be configured for > functionality to work. > > VFIO PCI for MSIx expects all the possible vectors to be setup up > when using VFIO_IRQ_SET_ACTION_TRIGGER so that they can be > allocated from kernel pci subsystem. Only way to increase the number > of vectors later is first free all by using VFIO_IRQ_SET_DATA_NONE > with action trigger and then enable new vector count. > > Above commit changes the behavior of rte_intr_[enable|disable] to > only mask and unmask unlike earlier behavior and thereby > breaking above two scenarios. > > Fixes: 89aac60e0be9 ("vfio: fix interrupts race condition") > Cc: david.march...@redhat.com > > Signed-off-by: Nithin Dabilpuram > Signed-off-by: Jerin Jacob > Tested-by: Stephen Hemminger > Tested-by: Shahed Shaikh Tested-by: Lei Yao This patch set pass the interrupt test with ixgbe, i40e and virtio. > --- > v4: > * No change. > v3: > * Update Shahed Shaikh's tested-by > v2: > * Include tested by sign from Stephen > > drivers/bus/pci/linux/pci_vfio.c | 78 ++-- > lib/librte_eal/linux/eal/eal_interrupts.c | 201 +++- > -- > 2 files changed, 191 insertions(+), 88 deletions(-) > > diff --git a/drivers/bus/pci/linux/pci_vfio.c > b/drivers/bus/pci/linux/pci_vfio.c > index ee31239..1ceb1c0 100644 > --- a/drivers/bus/pci/linux/pci_vfio.c > +++ b/drivers/bus/pci/linux/pci_vfio.c > @@ -187,11 +187,8 @@ pci_vfio_set_bus_master(int dev_fd, bool op) > static int > pci_vfio_setup_interrupts(struct rte_pci_device *dev, int vfio_dev_fd) > { > - char irq_set_buf[sizeof(struct vfio_irq_set) + sizeof(int)]; > - struct vfio_irq_set *irq_set; > - enum rte_intr_mode intr_mode; > int i, ret, intr_idx; > - int fd; > + enum rte_intr_mode intr_mode; > > /* default to invalid index */ > intr_idx = VFIO_PCI_NUM_IRQS; > @@ -223,6 +220,7 @@ pci_vfio_setup_interrupts(struct rte_pci_device *dev, > int vfio_dev_fd) > /* start from MSI-X interrupt type */ > for (i = VFIO_PCI_MSIX_IRQ_INDEX; i >= 0; i--) { > struct vfio_irq_info irq = { .argsz = sizeof(irq) }; > + int fd = -1; > > /* skip interrupt modes we don't want */ > if (intr_mode != RTE_INTR_MODE_NONE && > @@ -238,51 +236,51 @@ pci_vfio_setup_interrupts(struct rte_pci_device > *dev, int vfio_dev_fd) > return -1; > } > > - /* found a usable interrupt mode */ > - if ((irq.flags & VFIO_IRQ_INFO_EVENTFD) != 0) > - break; > - > /* if this vector cannot be used with eventfd, fail if we > explicitly >* specified interrupt type, otherwise continue */ > - if (intr_mode != RTE_INTR_MODE_NONE) { > - RTE_LOG(ERR, EAL, " interrupt vector does not > support eventfd!\n"); > + if ((irq.flags & VFIO_IRQ_INFO_EVENTFD) == 0) { > + if (intr_mode != RTE_INTR_MODE_NONE) { > + RTE_LOG(ERR, EAL, > + " interrupt vector does not > support eventfd!\n"); > + return -1; > + } else > + continue; > + } > + > + /* set up an eventfd for interrupts */ > + fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); > + if (fd < 0) { > + RTE_LOG(ERR, EAL, " cannot set up eventfd, " > + "error %i (%s)\n", errno, > strerror(errno
Re: [dpdk-dev] [PATCH v1] examples/vm_power: fix no PCI option for guest cli
> -Original Message- > From: Hunt, David > Sent: Tuesday, October 29, 2019 7:40 PM > To: dev@dpdk.org > Cc: Yao, Lei A ; Hunt, David > Subject: [PATCH v1] examples/vm_power: fix no PCI option for guest cli > > If there are no ports available to the guest cli application, it will exit > when > setting up the default policy because it fails to set the mac address. This > should not be the case, as this example can be used for many other use > cases that do not need ports. > > If ports not found, simply set nb_mac_to_monitor in the policy to zero and > continue. > > Fixes: 70febdcfd60f ("examples: check status of getting MAC address") > Signed-off-by: David Hunt Acked-by: Lei Yao > --- > examples/vm_power_manager/guest_cli/vm_power_cli_guest.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c > b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c > index eb0ae9114..96c1a1ff6 100644 > --- a/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c > +++ b/examples/vm_power_manager/guest_cli/vm_power_cli_guest.c > @@ -79,9 +79,9 @@ set_policy_defaults(struct channel_packet *pkt) > > ret = set_policy_mac(0, 0); > if (ret != 0) > - return ret; > - > - pkt->nb_mac_to_monitor = 1; > + pkt->nb_mac_to_monitor = 0; > + else > + pkt->nb_mac_to_monitor = 1; > > pkt->t_boost_status.tbEnabled = false; > > -- > 2.17.1
Re: [dpdk-dev] [PATCH v6 3/4] app/testpmd: move pkt prepare logic into a separate function
> -Original Message- > From: Pavan Nikhilesh Bhagavatula [mailto:pbhagavat...@marvell.com] > Sent: Tuesday, April 9, 2019 5:33 PM > To: Lin, Xueqin ; Yigit, Ferruh > Cc: dev@dpdk.org; Xu, Qian Q ; Li, WenjieX A > ; Wang, FengqinX ; > Yao, Lei A ; Wang, Yinan ; > Jerin Jacob Kollanukkaran ; tho...@monjalon.net; > arybche...@solarflare.com; Iremonger, Bernard > ; alia...@mellanox.com; Zhang, Qi Z > > Subject: RE: [dpdk-dev] [PATCH v6 3/4] app/testpmd: move pkt prepare logic > into a separate function > > Hi Lin, > > Can you check if the following patch fixes the issue? > http://patches.dpdk.org/patch/52395/ > > I wasn't able to catch this earlier. > > Regards, > Pavan Hi, Pavan With this patch, testpmd can generate packets with correct src mac address at my side now. Thanks BRs Lei > > >-Original Message- > >From: Lin, Xueqin > >Sent: Tuesday, April 9, 2019 2:58 PM > >To: Pavan Nikhilesh Bhagavatula ; Yigit, > Ferruh > > > >Cc: dev@dpdk.org; Xu, Qian Q ; Li, WenjieX A > >; Wang, FengqinX ; > Yao, > >Lei A ; Wang, Yinan ; Jerin > Jacob > >Kollanukkaran ; tho...@monjalon.net; > >arybche...@solarflare.com; Iremonger, Bernard > >; alia...@mellanox.com; Zhang, Qi Z > > > >Subject: [EXT] RE: [dpdk-dev] [PATCH v6 3/4] app/testpmd: move pkt > prepare > >logic into a separate function > > > >External Email > > > >-- > >Hi NIkhilesh, > > > >This patchset impacts some of 19.05 rc1 txonly/burst tests on Intel NIC. If > set > >txonly fwd, IXIA or tester peer can't receive packets that sent from app > >generated. > >This is high issue, block some cases test. Detailed information as below, > need > >you to check it soon. > > > >*DPDK version: 19.05.0-rc1 > >*NIC hardware: Fortville_eagle/Fortville_spirit/Niantic > >Environment: one NIC port connect with another NIC port, or one NIC port > >connect with IXIA > > > >Test Setup > >1. Bind port to igb_uio or vfio > >2. On DUT, setup testpmd: > >./x86_64-native-linuxapp-gcc/app/testpmd -c 0x1e -n 4 -- -i --rxq=4 -- > txq=4 -- > >port-topology=loop 3. Set txonly forward, start testpmd > >testpmd>set fwd txonly > >testpmd>start > >4. Dump packets from tester NIC port or IXIA, find no packets were > received on > >the PORT0. > >tcpdump -i -v > > > >Best regards, > >Xueqin > > > >> -Original Message- > >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Pavan Nikhilesh > >> Bhagavatula > >> Sent: Tuesday, April 2, 2019 5:54 PM > >> To: Jerin Jacob Kollanukkaran ; > >> tho...@monjalon.net; arybche...@solarflare.com; Yigit, Ferruh > >> ; Iremonger, Bernard > >> ; alia...@mellanox.com > >> Cc: dev@dpdk.org; Pavan Nikhilesh Bhagavatula > >> > >> Subject: [dpdk-dev] [PATCH v6 3/4] app/testpmd: move pkt prepare logic > >> into a separate function > >> > >> From: Pavan Nikhilesh > >> > >> Move the packet prepare logic into a separate function so that it can > >> be reused later. > >> > >> Signed-off-by: Pavan Nikhilesh > >> --- > >> app/test-pmd/txonly.c | 163 > >> +- > >> 1 file changed, 83 insertions(+), 80 deletions(-) > >> > >> diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index > >> 65171c1d1..56ca0ad24 100644 > >> --- a/app/test-pmd/txonly.c > >> +++ b/app/test-pmd/txonly.c > >> @@ -148,6 +148,80 @@ setup_pkt_udp_ip_headers(struct ipv4_hdr > *ip_hdr, > >>ip_hdr->hdr_checksum = (uint16_t) ip_cksum; } > >> > >> +static inline bool > >> +pkt_burst_prepare(struct rte_mbuf *pkt, struct rte_mempool *mbp, > >> + struct ether_hdr *eth_hdr, const uint16_t vlan_tci, > >> + const uint16_t vlan_tci_outer, const uint64_t ol_flags) { > >> + struct rte_mbuf *pkt_segs[RTE_MAX_SEGS_PER_PKT]; > >> + uint8_t ip_var = RTE_PER_LCORE(_ip_var); > >> + struct rte_mbuf *pkt_seg; > >> + uint32_t nb_segs, pkt_len; > >> + uint8_t i; > >> + > >> + if (unlikely(tx_pkt_split == TX_PKT_SPLIT_RND)) > >> + nb_segs = random() % tx_pkt_nb_segs + 1; > >> + else > >> + nb_segs = tx_pkt_nb_segs; > >> + > >> + if (nb_segs > 1) { > >> + if (rte_mempool_ge
Re: [dpdk-dev] [PATCH v2] app/testpmd: fix ether header size calculation
> -Original Message- > From: Yigit, Ferruh > Sent: Wednesday, April 10, 2019 4:59 AM > To: Pavan Nikhilesh Bhagavatula ; Jerin Jacob > Kollanukkaran ; Lin, Xueqin ; > Richardson, Bruce ; tho...@monjalon.net > Cc: dev@dpdk.org; Yao, Lei A ; Wang, FengqinX > > Subject: Re: [dpdk-dev] [PATCH v2] app/testpmd: fix ether header size > calculation > > On 4/9/2019 10:45 AM, Pavan Nikhilesh Bhagavatula wrote: > > From: Pavan Nikhilesh > > > > Fix ether header size calculation in Tx only mode. > > > > Coverity issue: 337684 > > Fixes: 01b645dcff7f ("app/testpmd: move txonly prepare in separate > function") > > > > Signed-off-by: Pavan Nikhilesh > > Reviewed-by: Ferruh Yigit > > Applied to dpdk-next-net/master, thanks. > > > @lei, can you please confirm the txonly mode on next-net? Hi, Ferruh The src mac of txonly mode is correct on next-net branch now. Need more regression test today to check if any other issue exist. Thanks. BRs Lei
[dpdk-dev] [PATCH v3] i40e: enable i40e pmd on ARM platform
Hi, Jianbo I have tested you patch , this v3 patch didn't impact the performance on X86 platform. Non-vector PMD single core performance with patch : ~35 Mpps Non-vector PMD single core performance without patch: ~35 Mpps BRs Lei -Original Message- Date: Fri, 5 Aug 2016 14:36:23 +0530 From: Jianbo Liu To: dev at dpdk.org,helin.zhang at intel.com, jingjing.wu at intel.com Cc: Jianbo Liu Subject: [dpdk-dev] [PATCH v3] i40e: enable i40e pmd on ARM platform Message-ID: <1470387983-12713-1-git-send-email-jianbo.liu at linaro.org> And add read memory barrier to avoid status inconsistency between two RX descriptors readings. Signed-off-by: Jianbo Liu --- config/defconfig_arm64-armv8a-linuxapp-gcc | 2 +- doc/guides/nics/features/i40e.ini | 1 + drivers/net/i40e/i40e_rxtx.c | 2 ++ 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc index 1a17126..08f282b 100644 --- a/config/defconfig_arm64-armv8a-linuxapp-gcc +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc @@ -46,6 +46,6 @@ CONFIG_RTE_EAL_IGB_UIO=n CONFIG_RTE_LIBRTE_IVSHMEM=n CONFIG_RTE_LIBRTE_FM10K_PMD=n -CONFIG_RTE_LIBRTE_I40E_PMD=n +CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n CONFIG_RTE_SCHED_VECTOR=n diff --git a/doc/guides/nics/features/i40e.ini b/doc/guides/nics/features/i40e.ini index fb3fb60..0d143bc 100644 --- a/doc/guides/nics/features/i40e.ini +++ b/doc/guides/nics/features/i40e.ini @@ -45,3 +45,4 @@ Linux UIO= Y Linux VFIO = Y x86-32 = Y x86-64 = Y +ARMv8= Y diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 554d167..57825fb 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -994,6 +994,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq) I40E_RXD_QW1_STATUS_SHIFT; } + rte_smp_rmb(); + /* Compute how many status bits were set */ for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++) nb_dd += s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT); -- 2.4.11
[dpdk-dev] [PATCH] vhost: add pmd xstats
Hi, Zhiyong I have tested more xstats performance drop data at my side. Vhost Xstats patch with mergeable on : ~3% Vhost Xstats patch with mergeable off : ~9% Because Zhihong also submit patch to improve the performance on for the mergeable on: http://dpdk.org/dev/patchwork/patch/15245/ ~15249. If both patch integrated, the performance drop will be much higher Vhsot Xstats patch + Vhost mergeable on patch with mergeable on : the performance drop is around 6% Best Regards Lei -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yang, Zhiyong Sent: Thursday, August 25, 2016 5:22 PM To: Panu Matilainen ; Thomas Monjalon ; Yuanhan Liu Cc: dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH] vhost: add pmd xstats > -Original Message- > From: Panu Matilainen [mailto:pmatilai at redhat.com] > Sent: Wednesday, August 24, 2016 8:37 PM > To: Thomas Monjalon ; Yuanhan Liu > > Cc: dev at dpdk.org; Yang, Zhiyong > Subject: Re: [dpdk-dev] [PATCH] vhost: add pmd xstats > > On 08/24/2016 11:44 AM, Thomas Monjalon wrote: > > 2016-08-24 13:46, Yuanhan Liu: > >> On Tue, Aug 23, 2016 at 12:45:54PM +0300, Panu Matilainen wrote: > >>>>>> Since collecting data of vhost_update_packet_xstats will have > >>>>>> some effect on RX/TX performance, so, Setting compiling switch > >>>>>> CONFIG_RTE_LIBRTE_PMD_VHOST_UPDATE_XSTATS=n by default > in the > >>>>> file > >>>>>> config/common_base, if needing xstats data, you can enable it(y). > >>>>> > >>>>> NAK, such things need to be switchable at run-time. > >>>>> > >>>>> - Panu - > >>>> > >>>> Considering the following reasons using the compiler switch, not > >>>> command-line at run-time. > >>>> > >>>> 1.Similar xstats update functions are always collecting stats > >>>> data in the background when rx/tx are running, such as the > >>>> physical NIC or virtio, which have no switch. Compiler switch for > >>>> vhost pmd xstats is added as a option when performance is viewed > >>>> as critical > factor. > >>>> > >>>> 2. No data structure and API in any layer support the xstats > >>>> update switch at run-time. Common data structure (struct > >>>> rte_eth_dev_data) has no device-specific data member, if > >>>> implementing enable/disable of vhost_update _packet_xstats at > >>>> run-time, must define a > >>>> flag(device-specific) in it, because the definition of struct > >>>> vhost_queue in the driver code (eth_vhost_rx/eth_vhost_tx > processing)is not visible from device perspective. > >>>> > >>>> 3. I tested RX/TX with v1 patch (y) as reference based on > >>>> Intel(R) > >>>> Xeon(R) CPU E5-2699 v3 @ 2.30GHz, for 64byts packets in burst > >>>> mode, > >>>> 32 packets in one RX/TX processing. Overhead of > >>>> vhost_update_packet_xstats is less than 3% for the rx/tx > >>>> processing. It looks that vhost_update_packet_xstats has a > >>>> limited > effect on performance drop. > >>> > >>> Well, either the performance overhead is acceptable and it should > >>> always be on (like with physical NICs I think). Or it is not. In > >>> which case it needs to be turnable on and off, at run-time. > >>> Rebuilding is not an option in the world of distros. > >> > >> I think the less than 3% overhead is acceptable here, that I agree > >> with Panu we should always keep it on. If someone compains it later > >> that even 3% is too big for them, let's consider to make it be > >> switchable at run-time. Either we could introduce a generic eth API > >> for that, Or just introduce a vhost one if that doesn't make too > >> much sense to other eth drivers. > > > > +1 > > It may have sense to introduce a generic run-time option for stats. > > > > Yup, sounds good. > It sounds better , if DPDK can add generic API and structure to the switch of xstats update. So, any device can use it at run time if necessary. Can we define one bit data member (xstats_update) in the data structure struct rte_eth_dev_data? such as: uint8_t promiscuous : 1, /**< RX promiscuous mode ON(1) / OFF(0). */ scattered_rx : 1, /**< RX of scattered packets is ON(1) / OFF(0) */ all_multicast : 1, /**< RX all multicast mode ON(1) /
[dpdk-dev] [PATCH] vhost: add pmd xstats
Hi, Qian The test setup at my side is Vhost/VirtIO loopback with 64B packets. -Original Message- From: Xu, Qian Q Sent: Tuesday, August 30, 2016 11:03 AM To: Yao, Lei A ; Yang, Zhiyong ; Panu Matilainen ; Thomas Monjalon ; Yuanhan Liu Cc: dev at dpdk.org Subject: RE: [dpdk-dev] [PATCH] vhost: add pmd xstats Lei Could you list the test setup for below findings? I think we need at least to check below tests for mergeable=on/off path: 1. Vhost/virtio loopback 2. PVP test : virtio-pmd IO fwd and virtio-net IPV4 fwd -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yao, Lei A Sent: Tuesday, August 30, 2016 10:46 AM To: Yang, Zhiyong ; Panu Matilainen ; Thomas Monjalon ; Yuanhan Liu Cc: dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH] vhost: add pmd xstats Hi, Zhiyong I have tested more xstats performance drop data at my side. Vhost Xstats patch with mergeable on : ~3% Vhost Xstats patch with mergeable off : ~9% Because Zhihong also submit patch to improve the performance on for the mergeable on: http://dpdk.org/dev/patchwork/patch/15245/ ~15249. If both patch integrated, the performance drop will be much higher Vhsot Xstats patch + Vhost mergeable on patch with mergeable on : the performance drop is around 6% Best Regards Lei -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yang, Zhiyong Sent: Thursday, August 25, 2016 5:22 PM To: Panu Matilainen ; Thomas Monjalon ; Yuanhan Liu Cc: dev at dpdk.org Subject: Re: [dpdk-dev] [PATCH] vhost: add pmd xstats > -Original Message- > From: Panu Matilainen [mailto:pmatilai at redhat.com] > Sent: Wednesday, August 24, 2016 8:37 PM > To: Thomas Monjalon ; Yuanhan Liu > > Cc: dev at dpdk.org; Yang, Zhiyong > Subject: Re: [dpdk-dev] [PATCH] vhost: add pmd xstats > > On 08/24/2016 11:44 AM, Thomas Monjalon wrote: > > 2016-08-24 13:46, Yuanhan Liu: > >> On Tue, Aug 23, 2016 at 12:45:54PM +0300, Panu Matilainen wrote: > >>>>>> Since collecting data of vhost_update_packet_xstats will have > >>>>>> some effect on RX/TX performance, so, Setting compiling switch > >>>>>> CONFIG_RTE_LIBRTE_PMD_VHOST_UPDATE_XSTATS=n by default > in the > >>>>> file > >>>>>> config/common_base, if needing xstats data, you can enable it(y). > >>>>> > >>>>> NAK, such things need to be switchable at run-time. > >>>>> > >>>>> - Panu - > >>>> > >>>> Considering the following reasons using the compiler switch, not > >>>> command-line at run-time. > >>>> > >>>> 1.Similar xstats update functions are always collecting stats > >>>> data in the background when rx/tx are running, such as the > >>>> physical NIC or virtio, which have no switch. Compiler switch for > >>>> vhost pmd xstats is added as a option when performance is viewed > >>>> as critical > factor. > >>>> > >>>> 2. No data structure and API in any layer support the xstats > >>>> update switch at run-time. Common data structure (struct > >>>> rte_eth_dev_data) has no device-specific data member, if > >>>> implementing enable/disable of vhost_update _packet_xstats at > >>>> run-time, must define a > >>>> flag(device-specific) in it, because the definition of struct > >>>> vhost_queue in the driver code (eth_vhost_rx/eth_vhost_tx > processing)is not visible from device perspective. > >>>> > >>>> 3. I tested RX/TX with v1 patch (y) as reference based on > >>>> Intel(R) > >>>> Xeon(R) CPU E5-2699 v3 @ 2.30GHz, for 64byts packets in burst > >>>> mode, > >>>> 32 packets in one RX/TX processing. Overhead of > >>>> vhost_update_packet_xstats is less than 3% for the rx/tx > >>>> processing. It looks that vhost_update_packet_xstats has a > >>>> limited > effect on performance drop. > >>> > >>> Well, either the performance overhead is acceptable and it should > >>> always be on (like with physical NICs I think). Or it is not. In > >>> which case it needs to be turnable on and off, at run-time. > >>> Rebuilding is not an option in the world of distros. > >> > >> I think the less than 3% overhead is acceptable here, that I agree > >> with Panu we should always keep it on. If someone compains it later > >> that even 3% is too big for them, let's consider to make it be > >> switchable at run-time. Eithe
[dpdk-dev] [PATCH v3 02/12] net/virtio: setup and start cq in configure callback
Hi, Olivier During the validation work with v16.11-rc2, I find that this patch will cause VM crash if enable virtio bonding in VM. Could you have a check at your side? The following is steps at my side. Thanks a lot 1. bind PF port to igb_uio. modprobe uio insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko ./tools/dpdk-devbind.py --bind=igb_uio 84:00.1 2. start vhost switch. ./examples/vhost/build/vhost-switch -c 0x1c -n 4 --socket-mem 4096,4096 - -p 0x1 --mergeable 0 --vm2vm 0 --socket-file ./vhost-net 3. bootup one vm with four virtio net device qemu-system-x86_64 \ -name vm0 -enable-kvm -chardev socket,path=/tmp/vm0_qga0.sock,server,nowait,id=vm0_qga0 \ -device virtio-serial -device virtserialport,chardev=vm0_qga0,name=org.qemu.guest_agent.0 \ -daemonize -monitor unix:/tmp/vm0_monitor.sock,server,nowait \ -net nic,vlan=0,macaddr=00:00:00:c7:56:64,addr=1f \ net user,vlan=0,hostfwd=tcp:10.239.129.127:6107:22 \ -chardev socket,id=char0,path=./vhost-net \ -netdev type=vhost-user,id=netdev0,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=netdev0,mac=52:54:00:00:00:01 \ -chardev socket,id=char1,path=./vhost-net \ -netdev type=vhost-user,id=netdev1,chardev=char1,vhostforce \ -device virtio-net-pci,netdev=netdev1,mac=52:54:00:00:00:02 \ -chardev socket,id=char2,path=./vhost-net \ -netdev type=vhost-user,id=netdev2,chardev=char2,vhostforce \ -device virtio-net-pci,netdev=netdev2,mac=52:54:00:00:00:03 \ -chardev socket,id=char3,path=./vhost-net \ -netdev type=vhost-user,id=netdev3,chardev=char3,vhostforce \ -device virtio-net-pci,netdev=netdev3,mac=52:54:00:00:00:04 \ -cpu host -smp 8 -m 4096 \ -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/huge,share=on \ -numa node,memdev=mem -mem-prealloc -drive file=/home/osimg/ubuntu16.img -vnc :10 4. on vm: bind virtio net device to igb_uio modprobe uio insmod ./x86_64-native-linuxapp-gcc/kmod/igb_uio.ko tools/dpdk-devbind.py --bind=igb_uio 00:04.0 00:05.0 00:06.0 00:07.0 5. startup test_pmd app ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x1f -n 4 - -i --txqflags=0xf00 --disable-hw-vlan-filter 6. create one bonding device (port 4) create bonded device 0 0 (the first 0: mode, the second: the socket number) show bonding config 4 7. bind port 0, 1, 2 to port 4 add bonding slave 0 4 add bonding slave 1 4 add bonding slave 2 4 port start 4 Result: just after port start 4(port 4 is bonded port), the vm shutdown immediately. BRs Lei -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Olivier Matz Sent: Thursday, October 13, 2016 10:16 PM To: dev at dpdk.org; yuanhan.liu at linux.intel.com Cc: Ananyev, Konstantin ; Chandran, Sugesh ; Richardson, Bruce ; Tan, Jianfeng ; Zhang, Helin ; adrien.mazarguil at 6wind.com; stephen at networkplumber.org; dprovan at bivio.net; Wang, Xiao W ; maxime.coquelin at redhat.com Subject: [dpdk-dev] [PATCH v3 02/12] net/virtio: setup and start cq in configure callback Move the configuration of control queue in the configure callback. This is needed by next commit, which introduces the reinitialization of the device in the configure callback to change the feature flags. Therefore, the control queue will have to be restarted at the same place. As virtio_dev_cq_queue_setup() is called from a place where config->max_virtqueue_pairs is not available, we need to store this in the private structure. It replaces max_rx_queues and max_tx_queues which have the same value. The log showing the value of max_rx_queues and max_tx_queues is also removed since config->max_virtqueue_pairs is already displayed above. Signed-off-by: Olivier Matz Reviewed-by: Maxime Coquelin --- drivers/net/virtio/virtio_ethdev.c | 43 +++--- drivers/net/virtio/virtio_ethdev.h | 4 ++-- drivers/net/virtio/virtio_pci.h| 3 +-- 3 files changed, 24 insertions(+), 26 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 77ca569..f3921ac 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -552,6 +552,9 @@ virtio_dev_close(struct rte_eth_dev *dev) if (hw->started == 1) virtio_dev_stop(dev); + if (hw->cvq) + virtio_dev_queue_release(hw->cvq->vq); + /* reset the NIC */ if (dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC) vtpci_irq_config(hw, VIRTIO_MSI_NO_VECTOR); @@ -1191,16 +1194,7 @@ virtio_init_device(struct rte_eth_dev *eth_dev) config->max_virtqueue_pairs = 1; } - hw->max_rx_queues = - (VIRTIO_MAX_RX_QUEUES < config->max_virtqueue_pairs) ? - VIRTIO_MAX_RX_QUEUES : config->max_virtqueue_pairs; - hw->max_tx_queues = - (VIRTIO_MAX_TX_QUEUES < config->max_virtqueue_pairs) ? -
[dpdk-dev] [PATCH v2 00/10] net/virtio: fix queue reconfigure issue
Tested-by: Lei Yao - Apply patch to v16.11-rc2 - Compile: Pass - OS: Ubuntu16.04 4.4.0-45-generic - GCC: 5.4.0 Most of the basic Virtio related test cases are tested with this patch. No function issue found and no obvious performance drop. The following is the pass case list: TC1:? vhost/virtio PVP vector performance ? TC2:? vhost/virtio PVP normal path performance TC3:? vhost/virtio PVP mergeable path performance TC7: vhost/virtio-net PVP ipv4 fwd normal path performance TC8: vhost/virtio-net PVP ipv4 fwd mergeable path performance TC9: vhost/virtio-net VM2VM iperf with TSO enabled performance TC11: vhost/virtio-pmd PVP with 2q 2c vector performance TC12: vhost/virtio-pmd PVP with 2q 1c vector performance TC16: vhost/virtio1.0 PVP normal performance TC17: vhost/virtio 1.0 PVP mergeable performance TC18: vhost/virtio 1.0 PVP vector performance(should be same as normal) TC19: dpdk vhost + virtio-pmd PVP vector performance TC20: dpdk vhost + virtio-pmd PVP non-vector performance TC21: dpdk vhost + virtio-pmd PVP mergeable performance TC25: Test Vhost/virtio-pmd PVP vector performance with qemu2.5 TC26: Test Vhost/virtio-pmd PVP vector performance with qemu2.6 TC27: Test Vhost/virtio-pmd PVP vector performance with qemu2.7 test vhost-user reconnect with virtio-pmd test virtio-pmd reconnect with vhost-user test vhost-user reconnect with multi virtio-pmd multi test virtio-pmd reconnect with vhost-user -Original Message- From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yuanhan Liu Sent: Saturday, November 5, 2016 5:41 PM To: dev at dpdk.org Cc: Thomas Monjalon ; Tan, Jianfeng ; Kevin Traynor ; Ilya Maximets ; Kyle Larose ; Maxime Coquelin ; Yuanhan Liu Subject: [dpdk-dev] [PATCH v2 00/10] net/virtio: fix queue reconfigure issue This patchset fixes few issues related to virtio queue reconfigure: increase or shrink the queue number. The major issue and the reason behind is described with length details in patch 4 "net/virtio: allocate queue at init stage". Those bugs can not be fixed by few lines of code, it's because the current driver init logic is quite wrong, that I need change quite many places to make it right. Meanwhile, I have already done my best to keep the changes being as minimal as possible, so that we could have fewer changes to break something else; also, it's would be easier for review. v2: - fix two more minor issues regarding to queue enabling; see patch 9 and 10. - refined commit log a bit. Thanks. --yliu --- Yuanhan Liu (10): net/virtio: revert fix restart net/virtio: simplify queue memzone name net/virtio: simplify queue allocation net/virtio: allocate queue at init stage net/virtio: initiate vring at init stage net/virtio: move queue configure code to proper place net/virtio: complete init stage at the right place net/virtio: remove started field net/virtio: fix less queues being enabled issue net/virtio: fix multiple queue enabling drivers/net/virtio/virtio_ethdev.c | 248 +-- drivers/net/virtio/virtio_ethdev.h | 16 -- drivers/net/virtio/virtio_pci.h| 3 +- drivers/net/virtio/virtio_rxtx.c | 291 - drivers/net/virtio/virtqueue.h | 7 + 5 files changed, 237 insertions(+), 328 deletions(-) -- 1.9.0
[dpdk-dev] [PATCH] examples/l3fwd: force CRC stripping for i40evf
Tested-by: Lei Yao - Apply patch to v16.11-rc3 - Compile: Pass - Host OS: VMware ESXi 6.0 - VM OS: Fedora 20 - GCC: 4.8.3 Tested with this patch, l3fwd sample can work with i40e VF with Fedora VM using VMware as the host. -Original Message- From: Topel, Bjorn Sent: Wednesday, November 9, 2016 4:24 PM To: dev at dpdk.org Cc: Xu, Qian Q ; Yao, Lei A ; Wu, Jingjing ; thomas.monjalon at 6wind.com; Topel, Bjorn Subject: [PATCH] examples/l3fwd: force CRC stripping for i40evf Commit 1bbcc5d21129 ("i40evf: report error for unsupported CRC stripping config") broke l3fwd, since it was forcing that CRC was kept. Now, if i40evf is running, CRC stripping will be enabled. Signed-off-by: Bj?rn T?pel --- examples/l3fwd/main.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c index 7223e773107e..b60278794135 100644 --- a/examples/l3fwd/main.c +++ b/examples/l3fwd/main.c @@ -906,6 +906,14 @@ main(int argc, char **argv) n_tx_queue = MAX_TX_QUEUE_PER_PORT; printf("Creating queues: nb_rxq=%d nb_txq=%u... ", nb_rx_queue, (unsigned)n_tx_queue ); + rte_eth_dev_info_get(portid, &dev_info); + if (dev_info.driver_name && + strcmp(dev_info.driver_name, "net_i40e_vf") == 0) { + /* i40evf require that CRC stripping is enabled. */ + port_conf.rxmode.hw_strip_crc = 1; + } else { + port_conf.rxmode.hw_strip_crc = 0; + } ret = rte_eth_dev_configure(portid, nb_rx_queue, (uint16_t)n_tx_queue, &port_conf); if (ret < 0) @@ -946,7 +954,6 @@ main(int argc, char **argv) printf("txq=%u,%d,%d ", lcore_id, queueid, socketid); fflush(stdout); - rte_eth_dev_info_get(portid, &dev_info); txconf = &dev_info.default_txconf; if (port_conf.rxmode.jumbo_frame) txconf->txq_flags = 0; -- 2.9.3
[dpdk-dev] [PATCH] examples/l3fwd: force CRC stripping for i40evf
I'm testing some DPDK sample under VMware. During the testing work, I find l3fwd+ ixgbe vf can work ,but L3fwd + i40evf can't work. So I reported this issue to Bjorn. From my perspective, if can add new parameter in l3fwd sample like what have already don?t in testpmd "crc-strip enable" is a better way to resolve this issue. Lei -Original Message- From: Topel, Bjorn Sent: Wednesday, November 9, 2016 9:10 PM To: Zhang, Helin ; Ananyev, Konstantin ; dev at dpdk.org Cc: Xu, Qian Q ; Yao, Lei A ; Wu, Jingjing ; thomas.monjalon at 6wind.com Subject: Re: [dpdk-dev] [PATCH] examples/l3fwd: force CRC stripping for i40evf Bj?rn/Konstantin wrote: >> Finally, why doesn't l3fwd have the CRC stripped? > > I don?t know any good reason for that for l3fwd or any other sample > app. I think it is just a 'historical' reason. Ok! Then I'd suggest changing the l3fwd default to actually *strip* CRC instead of not doing it. Lei, any comments? Helin wrote: > Yes, i40e driver changed a little bit on that according to the review > comments during implementation, comparing to igb and ixgbe. > I'd suggest to re-invesitgate if we can do the similar thing in igb > and ixgbe driver. Good. Let's do that! > Any critical issue now? Or just an improvement comments? Not from my perspective. The issue is that Lei needs some kind of work-around for l3fwd with i40evf, so I'll let Lei comment on how critical it is. Bj?rn