from:"linhaifeng"

[dpdk-dev] Is VFIO driver's performance better than IGB_UIO?

2014-08-08 Thread Linhaifeng

I have test the VFIO driver and IGB_UIO driver by l2fwd for many times. I find 
that the VFIO driver?s performance is not better than the IGB_UIO.

Is something wrong with my test? My test is as follow:

1.   bind two 82599 ether to VFIO  ./tools/dpdk_nic_bind.py -b vfio-pci 
03:00.0 03:00.1

2.   run the l2fwd to watch the stats info.

3.   Bind the 82599 ehter to IGB_UIO  ./tools/dpdk_nic_bind.py -b igb_uio 
03:00.0 03:00.1

4.   run the l2fwd to watch the stats info.




The result of test is :?Mpps?
VFIO-64-PHY-PHY

IGBUIO-64-PHY-PHY

VFIO-512-PHY-PHY

IGBUIO-512-PHY-PHY

2.6235456

2.3467793

1.9432854

1.9432753

2.5724822

2.5405128

1.9432777

1.9432832

2.2418318

2.5154781

1.9395376

1.9432291

2.470847

2.551112

1.9432767

1.9432756

2.5092176

2.4965851

1.9432705

1.9432733

2.51572

2.4703292

1.9432637

1.9432669

2.6293656

2.472452

1.9432719

1.943283

2.480364

2.6295004

1.9432775

1.943286

2.4242182

2.613162

1.943268

1.9432663



I have two questions:

1.   If the result is wrong how can I test the VFIO and IGB_UIO

2.   If the result is normal why use VFIO?

[dpdk-dev] 答复: Is VFIO driver's performance better than IGB_UIO?

2014-08-08 Thread Linhaifeng

Thank you very much.

My cpu is "Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz" 


--
???: Vincent JARDIN [mailto:vincent.jardin at 6wind.com] 
: 2014?8?8? 15:46
???: Linhaifeng
??: dev at dpdk.org; lixiao (H); Guofeng (E)
??: Re: [dpdk-dev] Is VFIO driver's performance better than IGB_UIO?

On 08/08/2014 09:41, Linhaifeng wrote:
> I have test the VFIO driver and IGB_UIO driver by l2fwd for many times. I 
> find that the VFIO driver?s performance is not better than the IGB_UIO.

You are right, under some conditions UIO is faster, VFIO provides safety. The 
best solution is a PMD without UIO, neither VFIO.

which CPU are you using?

Best regards,
  Vincent

[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-11 Thread Linhaifeng



On 2014/12/11 5:37, Huawei Xie wrote:
> vhost-user support
> 
> 
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/Makefile |   5 +-
>  lib/librte_vhost/vhost-net.h  |   4 +
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |   9 +
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 422 
> ++
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 108 +++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 199 
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  48 +++
>  lib/librte_vhost/virtio-net.c |  16 +-
>  lib/librte_vhost/virtio-net.h |  43 +++
>  9 files changed, 842 insertions(+), 12 deletions(-)
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
> 
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index e0d0ef6..b2f14a0 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile
> @@ -34,10 +34,11 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  # library name
>  LIB = librte_vhost.a
>  
> -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -I vhost_user -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
>  LDFLAGS += -lfuse
>  # all source are stored in SRCS-y
> -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
> +#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
> +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_user/vhost-net-user.c 
> vhost_user/virtio-net-user.c vhost_user/fd_man.c virtio-net.c vhost_rxtx.c
>  
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
> diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
> index f7e96fd..f9ec40b 100644
> --- a/lib/librte_vhost/vhost-net.h
> +++ b/lib/librte_vhost/vhost-net.h
> @@ -41,8 +41,12 @@
>  
>  #include 
>  
> +#include "rte_virtio_net.h"
> +
>  #define VHOST_MEMORY_MAX_NREGIONS 8
>  
> +extern struct vhost_net_device_ops const *ops;
> +
>  /* Macros for printing using RTE_LOG */
>  #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
>  #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
> diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c 
> b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> index edcbc10..8ac3360 100644
> --- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> @@ -268,6 +268,7 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
>   struct vhost_memory_region *mem_regions = (void *)(uintptr_t)
>   ((uint64_t)(uintptr_t)mem_regions_addr + size);
>   uint64_t base_address = 0, mapped_address, mapped_size;
> + struct virtio_dev *dev;
>  
>   for (idx = 0; idx < nregions; idx++) {
>   regions[idx].guest_phys_address =
> @@ -335,6 +336,14 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
>   regions[idx].guest_phys_address;
>   }
>  
> + dev = get_device(ctx);
> + if (dev && dev->mem && dev->mmaped_address) {
> + munmap((void *)(uintptr_t)dev->mmaped_address,
> + (size_t)dev->mmaped_size);
> + free(dev->mem);
> + dev->mem = NULL;
> + }
> +
>   ops->set_mem_table(ctx, ®ions[0], valid_regions);
>   return 0;
>  }
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
> b/lib/librte_vhost/vhost_user/vhost-net-user.c
> new file mode 100644
> index 000..841d7e6
> --- /dev/null
> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -0,0 +1,422 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND

[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-11 Thread Linhaifeng



On 2014/12/11 5:37, Huawei Xie wrote:
> vhost-user support
> 
> 
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/Makefile |   5 +-
>  lib/librte_vhost/vhost-net.h  |   4 +
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c |   9 +
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 422 
> ++
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 108 +++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 199 
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  48 +++
>  lib/librte_vhost/virtio-net.c |  16 +-
>  lib/librte_vhost/virtio-net.h |  43 +++
>  9 files changed, 842 insertions(+), 12 deletions(-)
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
> 
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index e0d0ef6..b2f14a0 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile
> @@ -34,10 +34,11 @@ include $(RTE_SDK)/mk/rte.vars.mk
>  # library name
>  LIB = librte_vhost.a
>  
> -CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -I vhost_cuse -I vhost_user -O3 
> -D_FILE_OFFSET_BITS=64 -lfuse
>  LDFLAGS += -lfuse
>  # all source are stored in SRCS-y
> -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
> +#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c virtio-net.c vhost_rxtx.c
> +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := vhost_user/vhost-net-user.c 
> vhost_user/virtio-net-user.c vhost_user/fd_man.c virtio-net.c vhost_rxtx.c
>  
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_virtio_net.h
> diff --git a/lib/librte_vhost/vhost-net.h b/lib/librte_vhost/vhost-net.h
> index f7e96fd..f9ec40b 100644
> --- a/lib/librte_vhost/vhost-net.h
> +++ b/lib/librte_vhost/vhost-net.h
> @@ -41,8 +41,12 @@
>  
>  #include 
>  
> +#include "rte_virtio_net.h"
> +
>  #define VHOST_MEMORY_MAX_NREGIONS 8
>  
> +extern struct vhost_net_device_ops const *ops;
> +
>  /* Macros for printing using RTE_LOG */
>  #define RTE_LOGTYPE_VHOST_CONFIG RTE_LOGTYPE_USER1
>  #define RTE_LOGTYPE_VHOST_DATA   RTE_LOGTYPE_USER1
> diff --git a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c 
> b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> index edcbc10..8ac3360 100644
> --- a/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> +++ b/lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
> @@ -268,6 +268,7 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
>   struct vhost_memory_region *mem_regions = (void *)(uintptr_t)
>   ((uint64_t)(uintptr_t)mem_regions_addr + size);
>   uint64_t base_address = 0, mapped_address, mapped_size;
> + struct virtio_dev *dev;
>  
>   for (idx = 0; idx < nregions; idx++) {
>   regions[idx].guest_phys_address =
> @@ -335,6 +336,14 @@ cuse_set_mem_table(struct vhost_device_ctx ctx,
>   regions[idx].guest_phys_address;
>   }
>  
> + dev = get_device(ctx);
> + if (dev && dev->mem && dev->mmaped_address) {
> + munmap((void *)(uintptr_t)dev->mmaped_address,
> + (size_t)dev->mmaped_size);
> + free(dev->mem);
> + dev->mem = NULL;
> + }
> +
>   ops->set_mem_table(ctx, ®ions[0], valid_regions);
>   return 0;
>  }
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
> b/lib/librte_vhost/vhost_user/vhost-net-user.c
> new file mode 100644
> index 000..841d7e6
> --- /dev/null
> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -0,0 +1,422 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + *   notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + *   notice, this list of conditions and the following disclaimer in
> + *   the documentation and/or other materials provided with the
> + *   distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + *   contributors may be used to endorse or promote products derived
> + *   from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND

[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2014-12-12 Thread Linhaifeng



On 2014/12/12 1:13, Xie, Huawei wrote:
>>
>> Only support one vhost-user port ?
> 
> Do you mean vhost server by "port"?
> If that is the case, yes, now only one vhost server is supported for multiple 
> virtio devices.
> As stated in the cover letter, we have requirement and plan for multiple 
> server support,
> though I am not sure if it is absolutely necessary.
> 
I think if you also want to suport client mode (qemu as server) is necessary to 
implement multiple server.

>>
>> Can you mmap the region if gpa is 0? When i run VM with two numa node (qemu
>> will create two hugepage file) found that always failed to mmap with the 
>> region
>> which gpa is 0.
> 
> Current implementation doesn't assume there is only one huge page file to 
> back the guest memory.
> It maps every region using the fd of that region. 
> Could you please paste your guest VM command line here?
> 
>>
>> BTW can we ensure the memory regions cover with all the memory of hugepage
>> for VM?
> 
> I think so, because virtio devices could use any normal guest memory, but we 
> needn't ensure that.
> We only need to map the region passed to us from qemu vhost, which should be 
> enough to translate
> the GPA in vring from virtio in guest, otherwise it is the bug of qemu vhost.
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instructions.

2015-05-13 Thread Linhaifeng



On 2015/5/13 9:18, Ravi Kerur wrote:
> If you can wait until Thursday I will probably send v3 patch which will
> have full memcmp support.

Ok, I'd like to test it:)

> 
> In your program try with volatile pointer and see if it helps.

like "volatile uint8_t *src, *dst" ?

[dpdk-dev] [PATCH v4 5/5] lib/librte_vhost: add vhost lib support in makefile

2014-09-13 Thread Linhaifeng

Will dpdk develop a vhost-user lib for the vhost-user backend of qemu?

On 2014/9/12 18:55, Huawei Xie wrote:
> The build of vhost lib requires fuse development package. It is turned off by
> default so as not to break DPDK build.
> 
> Signed-off-by: Huawei Xie 
> Acked-by: Konstantin Ananyev 
> Acked-by: Tommy Long 
> ---
>  config/common_linuxapp | 7 +++
>  lib/Makefile   | 1 +
>  mk/rte.app.mk  | 5 +
>  3 files changed, 13 insertions(+)
> 
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index 9047975..c7c1c83 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -390,6 +390,13 @@ CONFIG_RTE_KNI_VHOST_DEBUG_RX=n
>  CONFIG_RTE_KNI_VHOST_DEBUG_TX=n
>  
>  #
> +# Compile vhost library
> +# fuse, fuse-devel, kernel-modules-extra packages are needed
> +#
> +CONFIG_RTE_LIBRTE_VHOST=n
> +CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
> +
> +#
>  #Compile Xen domain0 support
>  #
>  CONFIG_RTE_LIBRTE_XEN_DOM0=n
> diff --git a/lib/Makefile b/lib/Makefile
> index 10c5bb3..007c174 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -60,6 +60,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
>  DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
>  DIRS-$(CONFIG_RTE_LIBRTE_KVARGS) += librte_kvargs
>  DIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += librte_distributor
> +DIRS-$(CONFIG_RTE_LIBRTE_VHOST) += librte_vhost
>  DIRS-$(CONFIG_RTE_LIBRTE_PORT) += librte_port
>  DIRS-$(CONFIG_RTE_LIBRTE_TABLE) += librte_table
>  DIRS-$(CONFIG_RTE_LIBRTE_PIPELINE) += librte_pipeline
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 34dff2a..285b65c 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -190,6 +190,11 @@ ifeq ($(CONFIG_RTE_LIBRTE_VIRTIO_PMD),y)
>  LDLIBS += -lrte_pmd_virtio_uio
>  endif
>  
> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST), y)
> +LDLIBS += -lrte_vhost
> +LDLIBS += -lfuse
> +endif
> +
>  ifeq ($(CONFIG_RTE_LIBRTE_I40E_PMD),y)
>  LDLIBS += -lrte_pmd_i40e
>  endif
>

[dpdk-dev] [RFC] lib/librte_vhost: qemu vhost-user support into DPDK vhost library

2014-09-13 Thread Linhaifeng

when will publish ?

On 2014/8/26 19:05, Xie, Huawei wrote:
> Hi all:
> We are implementing qemu official vhost-user interface into DPDK vhost 
> library, so there would be two coexisting implementations for user space 
> vhost backend.
> Pro and cons in my mind:
> Existing solution:
> Pros:  works with qemu version before 2.1;  Cons: depends on eventfd proxy 
> kernel module and extra maintenance effort
> Qemu vhost-user:
>Pros:  qemu official us-vhost interface; Cons: only 
> available after qemu 2.1
> 
> BR.
> huawei
> 
>

[dpdk-dev] How to check memory leak with dpdk application

2015-04-10 Thread Linhaifeng

Hi, all

I'am trying to use valgrind to check memory leak with my dpdk application but 
dpdk always failed to mmap hugepages.

Without valgrind it works well.How to run dpdk applications with valgrind?Is 
there any other way to check memory leak
with dpdk applications?

[dpdk-dev] How to check memory leak with dpdk application

2015-04-14 Thread Linhaifeng



On 2015/4/14 4:25, Marc Sune wrote:
> 
> 
> On 10/04/15 07:53, Linhaifeng wrote:
>> Hi, all
>>
>> I'am trying to use valgrind to check memory leak with my dpdk application 
>> but dpdk always failed to mmap hugepages.
>>
>> Without valgrind it works well.How to run dpdk applications with valgrind?Is 
>> there any other way to check memory leak
>> with dpdk applications?
>>
> 
> Yes it can be used, just that 3.10 has issues with hugepages. Check this out:
> 
> http://article.gmane.org/gmane.comp.networking.dpdk.devel/8058/match=valgrind+hugepages
> 
> Marc
> 
> 

Hi?Marc

Thank you very much!

[dpdk-dev] [PATCH] vhost: flush used->idx update before reading avail->flags

2015-04-24 Thread Linhaifeng



On 2015/4/23 0:33, Huawei Xie wrote:
> update of used->idx and read of avail->flags could be reordered.
> memory fence should be used to ensure the order, otherwise guest could see a 
> stale used->idx value after it toggles the interrupt suppression flag.
> 
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/vhost_rxtx.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 510ffe8..6afba35 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -178,6 +178,9 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>   *(volatile uint16_t *)&vq->used->idx += count;
>   vq->last_used_idx = res_end_idx;
>  
> + /* flush used->idx update before we read avail->flags. */
> + rte_mb();
> +
>   /* Kick the guest if necessary. */
>   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>   eventfd_write((int)vq->callfd, 1);
> 

If not add memory fence what would happen? Packets loss or interrupt loss?How 
to test it ?

Re: [dpdk-dev] [PATCH v2] vhost: fix add_guest_pages bug

2016-12-05 Thread linhaifeng

在 2016/12/6 10:28, Yuanhan Liu 写道:
> On Thu, Dec 01, 2016 at 07:42:02PM +0800, Haifeng Lin wrote:
>> When reg_size < page_size the function read in
>> rte_mem_virt2phy would not return, becausue
>> host_user_addr is invalid.
>>
>> Signed-off-by: Haifeng Lin 
>> ---
>> v2:
>> fix TYPO_SPELLING warning
>> ---
>>  lib/librte_vhost/vhost_user.c | 10 +-
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>> index 6b83c15..ce55e85 100644
>> --- a/lib/librte_vhost/vhost_user.c
>> +++ b/lib/librte_vhost/vhost_user.c
>> @@ -447,14 +447,14 @@ add_guest_pages(struct virtio_net *dev, struct 
>> virtio_memory_region *reg,
>>  reg_size -= size;
>>  
>>  while (reg_size > 0) {
>> +size = reg_size >= page_size ? page_size : reg_size;
> 
> I'd use RTE_MIN(reg_size, page_size) here. Also, this patch miss a
> fixline (http://dpdk.org/dev):
> 
> Fixes: e246896178e6 ("vhost: get guest/host physical address mappings")
> 
> Applied to dpdk-next-virtio, with above fixed.
> 
> Thanks for the fix!
> 
>   --yliu
> 
>>  host_phys_addr = rte_mem_virt2phy((void *)(uintptr_t)
>>host_user_addr);
>> -add_one_guest_page(dev, guest_phys_addr, host_phys_addr,
>> -   page_size);
>> +add_one_guest_page(dev, guest_phys_addr, host_phys_addr, size);
>>  
>> -host_user_addr  += page_size;
>> -guest_phys_addr += page_size;
>> -reg_size -= page_size;
>> +host_user_addr  += size;
>> +guest_phys_addr += size;
>> +reg_size -= size;
>>  }
>>  }
>>  
>> -- 
>> 1.8.3.1
>>
> 
> .
> 

Hi，yliu
The bug would happen like this：

-
| region|
-
:  : remain :
 -----
 |  hugepage  |   ...  |   hugepage   |
 -----
so the remain reg_size maybe smaller than a hugepage size, and "reg_size -= 
page_size" is not correct.

apply all the patch?

[dpdk-dev] [PATCH] net/bonding: not handle vlan slow packet

2016-10-28 Thread linhaifeng

If rx vlan offload is enable we should not handle vlan slow
packets too.

Signed-off-by: Haifeng Lin  
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 09ce7bf..7765017 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -169,10 +169,11 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf 
**bufs,
/* Remove packet from array if it is slow packet or slave is not
 * in collecting state or bondign interface is not in promiscus
 * mode and packet address does not match. */
-   if (unlikely(hdr->ether_type == ether_type_slow_be ||
+   if (unlikely(!bufs[j]->vlan_tci &&
+(hdr->ether_type == ether_type_slow_be ||
!collecting || (!promisc &&
!is_multicast_ether_addr(&hdr->d_addr) &&
-   !is_same_ether_addr(&bond_mac, &hdr->d_addr {
+   !is_same_ether_addr(&bond_mac, &hdr->d_addr) {

if (hdr->ether_type == ether_type_slow_be) {
bond_mode_8023ad_handle_slow_pkt(internals, slaves[i],
--
1.8.3.1

[dpdk-dev] [PATCH] net/bonding: not handle vlan slow packet

2016-10-31 Thread linhaifeng

From: ZengGanghui 

if rx vlan offload is enable we should not handle vlan slow
packets too.

Signed-off-by: Haifeng Lin 
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 43334f7..6c74bba 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -169,7 +169,7 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf 
**bufs,
/* Remove packet from array if it is slow packet or 
slave is not
 * in collecting state or bondign interface is not in 
promiscus
 * mode and packet address does not match. */
-   if (unlikely((hdr->ether_type == ether_type_slow_be ||
+   if (unlikely(((hdr->ether_type == ether_type_slow_be && 
!bufs[j]->vlan_tci) ||
!collecting || (!promisc &&
!is_multicast_ether_addr(&hdr->d_addr) 
&&
!is_same_ether_addr(&bond_mac, 
&hdr->d_addr))) &&
-- 
1.8.3.1

[dpdk-dev] net/bonding: not handle vlan slow packet

2016-10-31 Thread linhaifeng

if rx vlan offload is enable we should not handle vlan slow
packets too.

Signed-off-by: Haifeng Lin 

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 43334f7..6c74bba 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -169,7 +169,7 @@  bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf 
**bufs,
/* Remove packet from array if it is slow packet or 
slave is not
 * in collecting state or bondign interface is not in 
promiscus
 * mode and packet address does not match. */
-   if (unlikely((hdr->ether_type == ether_type_slow_be ||
+   if (unlikely(((hdr->ether_type == ether_type_slow_be && 
!bufs[j]->vlan_tci) ||
!collecting || (!promisc &&
!is_multicast_ether_addr(&hdr->d_addr) 
&&
!is_same_ether_addr(&bond_mac, 
&hdr->d_addr))) &&

[dpdk-dev] [PATCH] net/bonding: not handle vlan slow packet

2016-10-31 Thread linhaifeng

From: Haifeng Lin 

if rx vlan offload is enable we should not handle vlan slow
packets too.

Signed-off-by: Haifeng Lin 
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 09ce7bf..9e99442 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -169,7 +169,7 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf 
**bufs,
/* Remove packet from array if it is slow packet or 
slave is not
 * in collecting state or bondign interface is not in 
promiscus
 * mode and packet address does not match. */
-   if (unlikely(hdr->ether_type == ether_type_slow_be ||
+   if (unlikely((hdr->ether_type == ether_type_slow_be && 
&& !bufs[j]->vlan_tci) ||
!collecting || (!promisc &&
!is_multicast_ether_addr(&hdr->d_addr) 
&&
!is_same_ether_addr(&bond_mac, 
&hdr->d_addr {
-- 
1.8.3.1

[dpdk-dev] [PATCH] net/bonding: not handle vlan slow packet

2016-10-31 Thread linhaifeng

From: Haifeng Lin 

if rx vlan offload is enable we should not handle vlan slow
packets too.

Signed-off-by: Haifeng Lin 
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
b/drivers/net/bonding/rte_eth_bond_pmd.c
index 09ce7bf..ca17898 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -169,7 +169,8 @@ bond_ethdev_rx_burst_8023ad(void *queue, struct rte_mbuf 
**bufs,
/* Remove packet from array if it is slow packet or 
slave is not
 * in collecting state or bondign interface is not in 
promiscus
 * mode and packet address does not match. */
-   if (unlikely(hdr->ether_type == ether_type_slow_be ||
+   if (unlikely((hdr->ether_type == ether_type_slow_be &&
+   !bufs[j]->vlan_tci) ||
!collecting || (!promisc &&
!is_multicast_ether_addr(&hdr->d_addr) 
&&
!is_same_ether_addr(&bond_mac, 
&hdr->d_addr {
-- 
1.8.3.1

[dpdk-dev] [PATCH v3 2/2] vhost: call rte_pktmbuf_alloc_bulk in vhost dequeue

2015-12-23 Thread linhaifeng



>  
> + if (unlikely(alloc_err)) {
> + uint16_t i = entry_success;
> +
> + m->nb_segs = seg_num;
> + for (; i < free_entries; i++)
> + rte_pktmbuf_free(pkts[entry_success]); -> 
> rte_pktmbuf_free(pkts[i]);
> + }
> +
>   rte_compiler_barrier();
>   vq->used->idx += entry_success;
>   /* Kick guest if required. */
>

[dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer

2015-02-01 Thread Linhaifeng

Hi,xie & xu

I found that the new code had try to notify guest after send each packet after 
2bbb811.
So this bug not exist now.

static inline uint32_t __attribute__((always_inline))
virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
struct rte_mbuf **pkts, uint32_t count)
{
... ...

for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {

... ...

/* Kick the guest if necessary. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
eventfd_write((int)vq->kickfd, 1);
}

return count;
}

thank you very much!

On 2015/1/30 16:20, Xu, Qian Q wrote:
> Haifeng
> Could you give more information so that we can reproduce your issue? Thanks. 
> 1. What's your dpdk package, based on which branch, with Huawei's 
> vhost-user's patches? 
> 2. What's your step and command to launch vhost sample? 
> 3. What is mz? Your internal tool? I can't yum install mz or download mz 
> tool. 
> 4. As to your test scenario, I understand it in this way: virtio1 in VM1, 
> virtio2 in VM2, then let virtio1 send packages to virtio2, the problem is 
> that after 3 hours, virtio2 can't receive packets, but virtio1 is still 
> sending packets, am I right? So mz is like a packet generator to send 
> packets, right? 
> 
> 
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Linhaifeng
> Sent: Thursday, January 29, 2015 9:51 PM
> To: Xie, Huawei; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there 
> is no buffer
> 
> 
> 
> On 2015/1/29 21:00, Xie, Huawei wrote:
>>
>>
>>> -Original Message-
>>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>>> Sent: Thursday, January 29, 2015 8:39 PM
>>> To: Xie, Huawei; dev at dpdk.org
>>> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer 
>>> when there is no buffer
>>>
>>>
>>>
>>> On 2015/1/29 18:39, Xie, Huawei wrote:
>>>
>>>>> - if (count == 0)
>>>>> + /* If there is no buffers we should notify guest to fill.
>>>>> + * This is need when guest use virtio_net driver(not pmd).
>>>>> + */
>>>>> + if (count == 0) {
>>>>> + if (!(vq->avail->flags &
>>>>> VRING_AVAIL_F_NO_INTERRUPT))
>>>>> + eventfd_write((int)vq->kickfd, 1);
>>>>>   return 0;
>>>>> + }
>>>>
>>>> Haifeng:
>>>> Is it the root cause and is it protocol required?
>>>> Could you give a detailed description for that scenario?
>>>>
>>>
>>> I use mz to send data from one VM1 to VM2.The two VM use virtio-net driver.
>>> VM1 execute follow script:
>>> for((i=0;i<9;i++));
>>> do
>>> mz eth0 -t udp -A 1.1.1.1 -B 1.1.1.2 -a 00:00:00:00:00:01 -b 
>>> 00:00:00:00:00:02 -c
>>> 1000 -p 512
>>> sleep 4
>>> done
>>>
>>> VM2 execute follow command to watch:
>>> watch -d ifconfig
>>>
>>> After many hours VM2 stop to receive data.
>>>
>>> Could you test it ?
>>
>>
>> We could try next week after I send the whole patch. 
>> How many hours? Is it reproducible at your side? I inject packets through 
>> packet generator to guest for more than ten hours, haven't met issues. 
> 
> About three hours.
> What kind of driver you used in guest?virtio-net-pmd or virtio-net?
> 
> 
>> As I said in another mail sent  to you, could you dump the status of vring 
>> if you still have the spot?
> 
> How to dump the status of vring in guest?
> 
>> Could you please also reply to that mail?
>>
> 
> Which mail?
> 
> 
>> For the patch, if we have no root cause, I prefer not to apply it, so that 
>> we don't send more interrupts than needed to guest to affect performance.
> 
> I found that if we add this notify the performance is better(growth of 
> 100kpps when use 64byte UDP packets)
> 
>> People could temporarily apply this patch as a work around.
>>
>> Or anyone
>>
> 
> OK.I'm also not sure about this bug.I think i should do something to found 
> the real reason.
> 
>>
>>> --
>>> Regards,
>>> Haifeng
>>
>>
>>
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v1] librte_vhost: Add an abstraction to hide vhost-user and cuse devices.

2015-02-02 Thread Linhaifeng


On 2015/2/1 18:36, Tetsuya Mukawa wrote:
> This patch should be put on "lib/librte_vhost: vhost-user support"
> patch series written by Xie, Huawei.
> 
> There are 2 type of vhost devices. One is cuse, the other is vhost-user.
> So far, one of them we can use. To use the other, DPDK is needed to be
> recompiled.

If we use vhost-user we also should install cuse and fuse module ?
I think is not a good idea.


> The patch introduces rte_vhost_dev_type parameter. Using type parameter,
> the DPDK application can use both vhost devices without recompile.
> 
> The type parameter should be specified when following vhost APIs are called.
> - int rte_vhost_driver_register();
> - int rte_vhost_driver_session_start();
> 
> Signed-off-by: Tetsuya Mukawa 
> ---
>  examples/vhost/main.c|  4 +-
>  lib/librte_vhost/Makefile|  4 +-
>  lib/librte_vhost/rte_virtio_net.h| 15 +-
>  lib/librte_vhost/vhost-net.c | 74 
> 
>  lib/librte_vhost/vhost_cuse/vhost-net-cdev.c |  5 +-
>  lib/librte_vhost/vhost_cuse/vhost-net-cdev.h | 42 
>  lib/librte_vhost/vhost_user/vhost-net-user.c |  4 +-
>  lib/librte_vhost/vhost_user/vhost-net-user.h |  7 +++
>  8 files changed, 145 insertions(+), 10 deletions(-)
>  create mode 100644 lib/librte_vhost/vhost-net.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.h
> 
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index 04f0118..545df72 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -3040,14 +3040,14 @@ main(int argc, char *argv[])
>   rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_MRG_RXBUF);
>  
>   /* Register CUSE device to handle IOCTLs. */
> - ret = rte_vhost_driver_register((char *)&dev_basename);
> + ret = rte_vhost_driver_register((char *)&dev_basename, VHOST_DEV_CUSE);
>   if (ret != 0)
>   rte_exit(EXIT_FAILURE,"CUSE device setup failure.\n");
>  
>   rte_vhost_driver_callback_register(&virtio_net_device_ops);
>  
>   /* Start CUSE session. */
> - rte_vhost_driver_session_start();
> + rte_vhost_driver_session_start(VHOST_DEV_CUSE);
>   return 0;
>  
>  }
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index 22319b8..cc95415 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile
> @@ -39,8 +39,8 @@ CFLAGS += -I vhost_cuse -lfuse
>  CFLAGS += -I vhost_user
>  LDFLAGS += -lfuse
>  # all source are stored in SRCS-y
> -SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost_rxtx.c
> -#SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c
> +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := virtio-net.c vhost-net.c vhost_rxtx.c
> +SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_cuse/vhost-net-cdev.c 
> vhost_cuse/virtio-net-cdev.c vhost_cuse/eventfd_copy.c
>  SRCS-$(CONFIG_RTE_LIBRTE_VHOST) += vhost_user/vhost-net-user.c 
> vhost_user/virtio-net-user.c vhost_user/fd_man.c
>  
>  # install includes
> diff --git a/lib/librte_vhost/rte_virtio_net.h 
> b/lib/librte_vhost/rte_virtio_net.h
> index 611a3d4..7b3952c 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -166,6 +166,15 @@ gpa_to_vva(struct virtio_net *dev, uint64_t guest_pa)
>  }
>  
>  /**
> + * Enum for vhost device types.
> + */
> +enum rte_vhost_dev_type {
> + VHOST_DEV_CUSE, /* cuse driver */
> + VHOST_DEV_USER, /* vhost-user driver */
> + VHOST_DEV_MAX   /* the number of vhost driver types */
> +};
> +
> +/**
>   *  Disable features in feature_mask. Returns 0 on success.
>   */
>  int rte_vhost_feature_disable(uint64_t feature_mask);
> @@ -181,12 +190,14 @@ uint64_t rte_vhost_feature_get(void);
>  int rte_vhost_enable_guest_notification(struct virtio_net *dev, uint16_t 
> queue_id, int enable);
>  
>  /* Register vhost driver. dev_name could be different for multiple instance 
> support. */
> -int rte_vhost_driver_register(const char *dev_name);
> +int rte_vhost_driver_register(const char *dev_name,
> + enum rte_vhost_dev_type dev_type);
>  
>  /* Register callbacks. */
>  int rte_vhost_driver_callback_register(struct virtio_net_device_ops const * 
> const);
> +
>  /* Start vhost driver session blocking loop. */
> -int rte_vhost_driver_session_start(void);
> +int rte_vhost_driver_session_start(enum rte_vhost_dev_type dev_type);
>  
>  /**
>   * This function adds buffers to the virtio devices RX virtqueue. Buffers can
> diff --git a/lib/librte_vhost/vhost-net.c b/lib/librte_vhost/vhost-net.c
> new file mode 100644
> index 000..d0316d7
> --- /dev/null
> +++ b/lib/librte_vhost/vhost-net.c
> @@ -0,0 +1,74 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 IGEL Co.,Ltd. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provi

[dpdk-dev] 答复: [PATCH] net/bonding: fix double fetch for active_slave_count

2018-11-29 Thread Linhaifeng

Hi, Chars

Thank you.

 I use it for send pkts to the dedicated queue of slaves.

Maybe i  should not use it. I would though another way.

-邮件原件-
发件人: Chas Williams [mailto:3ch...@gmail.com] 
发送时间: 2018年11月30日 11:27
收件人: Linhaifeng ; dev@dpdk.org
抄送: ch...@att.com
主题: Re: [dpdk-dev] [PATCH] net/bonding: fix double fetch for active_slave_count

I guess this is slightly more correct. There is still a race here though.
After you make your copy of active_slave_count, the number of active slaves 
could go to 0 and the memcpy() would copy an invalid element, acitve_slaves[0]. 
 There is no simple fix to this problem.  Your patch reduces the opportunity 
for a race but doesn't eliminate it.

What you are using this API for?

On 11/29/18 12:32 AM, Haifeng Lin wrote:
> 1. when memcpy slaves the internals->active_slave_count 1 2. return 
> internals->active_slave_count is 2 3. the slaves[1] would be a random 
> invalid value
> 
> Signed-off-by: Haifeng Lin 
> ---
>   drivers/net/bonding/rte_eth_bond_api.c | 8 +---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_api.c 
> b/drivers/net/bonding/rte_eth_bond_api.c
> index 21bcd50..ed7b02e 100644
> --- a/drivers/net/bonding/rte_eth_bond_api.c
> +++ b/drivers/net/bonding/rte_eth_bond_api.c
> @@ -815,6 +815,7 @@
>   uint16_t len)
>   {
>   struct bond_dev_private *internals;
> + uint16_t active_slave_count;
>   
>   if (valid_bonded_port_id(bonded_port_id) != 0)
>   return -1;
> @@ -824,13 +825,14 @@
>   
>   internals = rte_eth_devices[bonded_port_id].data->dev_private;
>   
> - if (internals->active_slave_count > len)
> + active_slave_count = internals->active_slave_count;
> + if (active_slave_count > len)
>   return -1;
>   
>   memcpy(slaves, internals->active_slaves,
> - internals->active_slave_count * sizeof(internals->active_slaves[0]));
> + active_slave_count * 
> sizeof(internals->active_slaves[0]));
>   
> - return internals->active_slave_count;
> + return active_slave_count;
>   }
>   
>   int
>

[dpdk-dev] some questions about rte_memcpy

2015-01-22 Thread Linhaifeng

#define rte_memcpy(dst, src, n)  \
((__builtin_constant_p(n)) ?  \
memcpy((dst), (src), (n)) :  \
rte_memcpy_func((dst), (src), (n)))


Why call memcpy when n is constant variable?
Can i change them to the follow codes?

#define rte_memcpy(dst, src, n)  \
{   \
int num = n;\
rte_memcpy_func((dst), (src), (num)))   \
}


-- 
Regards,
Haifeng

[dpdk-dev] some questions about rte_memcpy

2015-01-22 Thread Linhaifeng



On 2015/1/22 12:45, Matthew Hall wrote:
> One theory. Many DPDK functions crash if they are called before 
> rte_eal_init() 
> is called. So perhaps this could be a cause, since that won't have been 
> called 
> when working on a constant

Hi, Matthew

Thank you for your response.

Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
-- 
Regards,
Haifeng

[dpdk-dev] some questions about rte_memcpy

2015-01-22 Thread Linhaifeng



On 2015/1/22 19:34, Bruce Richardson wrote:
> On Thu, Jan 22, 2015 at 07:23:49PM +0900, Tetsuya Mukawa wrote:
>> On 2015/01/22 16:35, Matthew Hall wrote:
>>> On Thu, Jan 22, 2015 at 01:32:04PM +0800, Linhaifeng wrote:
>>>> Do you mean if call rte_memcpy before rte_eal_init() would crash?why?
>>> No guarantee. But a theory. It might use some things from the EAL init to 
>>> figure out which version of the accelerated algorithm to use.
>>
>> This selection is done at compile-time.
>> And if the size is constant, I guess DPDK assumes memcpy is replaced by
>> inline __builtin_memcpy.
>> I haven't checked the performance of builtin memcpy, but probably much
>> faster.
>>
> 
> Yes, that assumption is correct. A couple of years ago we discovered that for
> constant size values, the compiler would generate much faster code for us
> using a regular memcpy than rte_memcpy, hence the macro.
> 
> /Bruce
> 
>> Tetsuya
>>
>>> Matthew.
>>
>>
> 
> 

Hi,Bruce

I test it,most results like you said use constant may be faster,but sometimes 
not.

linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 999
rte_memcpy(constant) used:279893712 @@ not faster
rte_memcpy(variable) used:277818600
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 999
rte_memcpy(constant) used:279264328 @@ not faster
rte_memcpy(variable) used:277667116
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 16 999
rte_memcpy(constant) used:279491832 @@ not faster
rte_memcpy(variable) used:277622772
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 999
rte_memcpy(constant) used:279402156 @@ not faster
rte_memcpy(variable) used:277738464
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 999
rte_memcpy(constant) used:279305172 @@ not faster
rte_memcpy(variable) used:277483004
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 32 999
rte_memcpy(constant) used:279784124 @@ not faster
rte_memcpy(variable) used:277605332
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 999
rte_memcpy(constant) used:322817260
rte_memcpy(variable) used:350333864
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 999
rte_memcpy(constant) used:322840748
rte_memcpy(variable) used:350297868
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 48 999
rte_memcpy(constant) used:322488240
rte_memcpy(variable) used:350348652
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 999
rte_memcpy(constant) used:322021428
rte_memcpy(variable) used:350416440
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 999
rte_memcpy(constant) used:321370900
rte_memcpy(variable) used:350355796
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 64 999
rte_memcpy(constant) used:322704552
rte_memcpy(variable) used:349900832
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 999
rte_memcpy(constant) used:422705828
rte_memcpy(variable) used:425493328
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 999
rte_memcpy(constant) used:422421840 @@ not faster
rte_memcpy(variable) used:413691412
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 128 999
rte_memcpy(constant) used:425233088 @@ not faster
rte_memcpy(variable) used:421136724
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 999
rte_memcpy(constant) used:901014608 @@ not faster
rte_memcpy(variable) used:900997388
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 999
rte_memcpy(constant) used:900803308 @@ not faster
rte_memcpy(variable) used:900794076
linux-mnSyvH:/mnt/sdb/linhf/test # ./rte_memcpy_test 256 999
rte_memcpy(constant) used:901842436 @@ not faster
rte_memcpy(variable) used:901218984
linux-mnSyvH:/mnt/sdb/linhf/test #



here is my test codes:

#include 
#include 
#include 


int main(int narg, char** args)
{
int i;
char buf[1024];
uint64_t start, end;

if (narg < 3) {
printf("usage:./rte_memcpy_test size times\n");
return 0;
}

size_t size_v = atoi(args[1]);
const size_t size_c = atoi(args[1]);
int times = atoi(args[2]);

start = rte_rdtsc();
for(i = 0; i < times; i++) {
rte_memcpy(buf, buf, size_c);
}
end = rte_rdtsc();
printf("rte_memcpy(constant) used:%llu\n", end - start);

start = rte_rdtsc();
for (i = 0; i < times; i++) {
rte_memcpy(buf, buf, size_v);
}
end = rte_rdtsc();
printf("rte_memcpy(variable) used:%llu\n", end - start);

return 0;
}





-- 
Regards,
Haifeng

[dpdk-dev] some questions about rte_memcpy

2015-01-23 Thread Linhaifeng



On 2015/1/22 23:21, Bruce Richardson wrote:
> This (size_c) is a run-time constant, not a compile-time constant. To trigger 
> the
> memcpy optimizations inside the compiler, the size value must be constant at
> compile time.


Hi, Bruce

You are right. When use compile-time constant memcpy is faster.Thank you for 
all.

Here is my test result:

rte_memcpy(constant) size:8 time:876
rte_memcpy(variable) size:8 time:2824
rte_memcpy(constant) size:16 time:868
rte_memcpy(variable) size:16 time:4436
rte_memcpy(constant) size:32 time:856
rte_memcpy(variable) size:32 time:3264
rte_memcpy(constant) size:48 time:872
rte_memcpy(variable) size:48 time:3972
rte_memcpy(constant) size:64 time:856
rte_memcpy(variable) size:64 time:3644
rte_memcpy(constant) size:128 time:868
rte_memcpy(variable) size:128 time:4720
rte_memcpy(constant) size:256 time:868
rte_memcpy(variable) size:256 time:9624

Here is my test program(Who know how to use a loop to test 'constant memcpy'?):

#include 
#include 
#include 


int main(int narg, char** args)
{
int i,t;
char buf[256];
int tests[7] = {8,16,32,48,64,128,256};
char 
buf8[8],buf16[16],buf32[32],buf48[48],buf64[64],buf128[128],buf256[256];
uint64_t start, end;
int times = 999;
uint64_t result_c[7];

if (narg < 2) {
printf("usage:./rte_memcpy_test times\n");
return -1;
}

times = atoi(args[1]);

start = rte_rdtsc();
for(t = 0; t < times; t++) {
rte_memcpy(buf8, buf8, sizeof buf8);
}
end = rte_rdtsc();
result_c[0] = end - start;

start = rte_rdtsc();
for(t = 0; t < times; t++) {
rte_memcpy(buf16, buf16, sizeof buf16);
}
end = rte_rdtsc();
result_c[1] = end - start;

start = rte_rdtsc();
for(t = 0; t < times; t++) {
rte_memcpy(buf32, buf32, sizeof buf32);
}
end = rte_rdtsc();
result_c[2] = end - start;

start = rte_rdtsc();
for(t = 0; t < times; t++) {
rte_memcpy(buf48, buf48, sizeof buf48);
}
end = rte_rdtsc();
result_c[3] = end - start;

start = rte_rdtsc();
for(t = 0; t < times; t++) {
rte_memcpy(buf64, buf64, sizeof buf64);
}
end = rte_rdtsc();
result_c[4] = end - start;

start = rte_rdtsc();
for(t = 0; t < times; t++) {
rte_memcpy(buf128, buf128, sizeof buf128);
}
end = rte_rdtsc();
result_c[5] = end - start;

start = rte_rdtsc();
for(t = 0; t < times; t++) {
rte_memcpy(buf256, buf256, sizeof buf256);
}
end = rte_rdtsc();
result_c[6] = end - start;

for (i = 0; i < (sizeof tests / sizeof tests[0]); i++) {
start = rte_rdtsc();
for(t = 0; t < times; t++) {
rte_memcpy(buf, buf, tests[i]);
}
end = rte_rdtsc();
printf("rte_memcpy(constant) size:%d time:%llu\n", tests[i], 
result_c[i]);
printf("rte_memcpy(variable) size:%d time:%llu\n", tests[i], 
end - start);
}

return 0;
}

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2015-01-23 Thread Linhaifeng



On 2015/1/23 11:40, Xie, Huawei wrote:
> 
> 
>> -Original Message-----
>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>> Sent: Thursday, December 11, 2014 1:36 PM
>> To: Xie, Huawei; dev at dpdk.org
>> Cc: haifeng.lin at intel.com
>> Subject: Re: [dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user
>> support
>>
>>
>>
>> On 2014/12/11 5:37, Huawei Xie wrote:
>>> vhost-user support
>>>
>>>
>>> Signed-off-by: Huawei Xie 
>>> ---
>>>  lib/librte_vhost/Makefile |   5 +-
>>> +   case VHOST_USER_SET_LOG_FD:
>>
>> should close fd for fd leak when receive VHOST_USER_SET_LOG_FD msg?
> Ok, would at least close the fd though we don't support.
> Do you know how to test this?

reboot VM many times then watch the number of fd with command 'ls /proc/pidof 
vhost/fd |wc -l'

>>
>>> +   RTE_LOG(INFO, VHOST_CONFIG, "not implemented.\n");
>>> +
>>> +   case VHOST_USER_SET_VRING_ERR:
>>
>> should close fd for fd leak?
> Would do.
>>
>>> +   RTE_LOG(INFO, VHOST_CONFIG, "not implemented\n");
>>> +   break;
>>> +#endif
>>>
>>
>> --
>> Regards,
>> Haifeng
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH RFC v2 00/12] lib/librte_vhost: vhost-user support

2015-01-23 Thread Linhaifeng

Hi, Xie

could you test vhost-user with follow numa node xml:
2097152
  

  
  

  


I cann't receive data from VM with above xml.

On 2014/12/11 5:37, Huawei Xie wrote:
> This patchset refines vhost library to support both vhost-cuse and vhost-user.
> 
> 
> Huawei Xie (12):
>   create vhost_cuse directory and move vhost-net-cdev.c to vhost_cuse 
> directory
>   rename vhost-net-cdev.h as vhost-net.h
>   move eventfd_copy logic out from virtio-net.c to vhost-net-cdev.c
>   exact copy of host_memory_map from virtio-net.c to new file
>   virtio-net-cdev.c
>   host_memory_map refine: map partial memory of target process into current 
> process
>   cuse_set_memory_table is the VHOST_SET_MEMORY_TABLE message handler for cuse
>   fd management for vhost user
>   vhost-user support
>   minor fix
>   vhost-user memory region map/unmap
>   kick/callfd fix
>   cleanup when vhost user connection is closed
> 
>  lib/librte_vhost/Makefile |   5 +-
>  lib/librte_vhost/rte_virtio_net.h |   2 +
>  lib/librte_vhost/vhost-net-cdev.c | 389 --
>  lib/librte_vhost/vhost-net-cdev.h | 113 ---
>  lib/librte_vhost/vhost-net.h  | 117 +++
>  lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 452 
> ++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 349 
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  45 +++
>  lib/librte_vhost/vhost_rxtx.c |   2 +-
>  lib/librte_vhost/vhost_user/fd_man.c  | 205 
>  lib/librte_vhost/vhost_user/fd_man.h  |  64 
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 423 
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 107 ++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 313 ++
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  49 +++
>  lib/librte_vhost/virtio-net.c | 394 ++
>  lib/librte_vhost/virtio-net.h |  43 +++
>  17 files changed, 2199 insertions(+), 873 deletions(-)
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost-net.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.h
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH RFC v2 08/12] lib/librte_vhost: vhost-user support

2015-01-23 Thread Linhaifeng




>>
>> Can you mmap the region if gpa is 0? When i run VM with two numa node (qemu
>> will create two hugepage file) found that always failed to mmap with the 
>> region
>> which gpa is 0.
>>
>> BTW can we ensure the memory regions cover with all the memory of hugepage
>> for VM?
>>
> We had discussed this. Seemed a qemu bug.
> Do you have update on this?
> 

Yes,because qemu not send all hugepage file fd to vhost-user.I have send the 
patch bug no body reply.


> 
> Thanks Haifeng for your comments.
> Next time, could you help cut the code not commented?
> They are too long. It is really hard for me to find your comment.
> 
> 
> .
> 

OK:)

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v2] vhost: add interface name to virtio-net struct

2015-01-25 Thread Linhaifeng



On 2014/12/19 2:07, ciara.loftus at intel.com wrote:
> From: Ciara Loftus 
> 
> This patch fixes the issue whereby when using userspace vhost ports
> in the context of vSwitching, the name provided to the hypervisor/QEMU
> of the vhost tap device needs to be exposed in the library, in order

Who create the tap device?When using vhost-user QEMU will not create it.

> for the vSwitch to be able to direct packets to the correct device.

Why not use unix domain socket path?

> This patch introduces an 'ifname' member to the virtio-net structure
> which is populated with the tap device name when QEMU is brought up
> with a vhost device.
> 
> Signed-off-by: Ciara Loftus 
> Signed-off-by: Anthony Fee 
> Acked-by: Huawei Xie 
> ---
>  lib/librte_vhost/rte_virtio_net.h |3 ++
>  lib/librte_vhost/virtio-net.c |   48 
> -
>  2 files changed, 50 insertions(+), 1 deletions(-)
> 
> diff --git a/lib/librte_vhost/rte_virtio_net.h 
> b/lib/librte_vhost/rte_virtio_net.h
> index 00b1328..0bf07c7 100644
> --- a/lib/librte_vhost/rte_virtio_net.h
> +++ b/lib/librte_vhost/rte_virtio_net.h
> @@ -43,6 +43,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include 
>  #include 
> @@ -96,6 +98,7 @@ struct virtio_net {
>   uint64_tfeatures;   /**< Negotiated feature set. */
>   uint64_tdevice_fh;  /**< device identifier. */
>   uint32_tflags;  /**< Device flags. Only used to 
> check if device is running on data core. */
> + charifname[IFNAMSIZ];   /**< Name of the tap 
> device. */
>   void*priv;  /**< private context */
>  } __rte_cache_aligned;
>  
> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> index 852b6d1..7eae5ee 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -43,6 +43,10 @@
>  #include 
>  #include 
>  
> +#include 
> +#include 
> +#include 
> +
>  #include 
>  #include 
>  #include 
> @@ -1000,6 +1004,46 @@ set_vring_kick(struct vhost_device_ctx ctx, struct 
> vhost_vring_file *file)
>  }
>  
>  /*
> + * Function to get the tap device name from the provided file descriptor and
> + * save it in the device structure.
> + */
> +static int
> +get_ifname(struct virtio_net *dev, int tap_fd, int pid)
> +{
> + struct eventfd_copy fd_tap;
> + struct ifreq ifr;
> + uint32_t size, ifr_size;
> + int ret;
> +
> + fd_tap.source_fd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
> + fd_tap.target_fd = tap_fd;
> + fd_tap.target_pid = pid;
> +
> + if (eventfd_copy(dev, &fd_tap))
> + return -1;
> +
> + ret = ioctl(fd_tap.source_fd, TUNGETIFF, &ifr);
> +
> + if (close(fd_tap.source_fd) < 0)
> + RTE_LOG(ERR, VHOST_CONFIG,
> + "(%"PRIu64") fd close failed\n",
> + dev->device_fh);
> +
> + if (ret >= 0) {
> + ifr_size = strnlen(ifr.ifr_name, sizeof(ifr.ifr_name));
> + size = ifr_size > sizeof(dev->ifname)?
> + sizeof(dev->ifname): ifr_size;
> +
> + strncpy(dev->ifname, ifr.ifr_name, size);
> + } else
> + RTE_LOG(ERR, VHOST_CONFIG,
> + "(%"PRIu64") TUNGETIFF ioctl failed\n",
> + dev->device_fh);
> +
> + return 0;
> +}
> +
> +/*
>   * Called from CUSE IOCTL: VHOST_NET_SET_BACKEND
>   * To complete device initialisation when the virtio driver is loaded,
>   * we are provided with a valid fd for a tap device (not used by us).
> @@ -1026,8 +1070,10 @@ set_backend(struct vhost_device_ctx ctx, struct 
> vhost_vring_file *file)
>*/
>   if (!(dev->flags & VIRTIO_DEV_RUNNING)) {
>   if (((int)dev->virtqueue[VIRTIO_TXQ]->backend != 
> VIRTIO_DEV_STOPPED) &&
> - ((int)dev->virtqueue[VIRTIO_RXQ]->backend != 
> VIRTIO_DEV_STOPPED))
> + ((int)dev->virtqueue[VIRTIO_RXQ]->backend != 
> VIRTIO_DEV_STOPPED)) {
> + get_ifname(dev, file->fd, ctx.pid);
>   return notify_ops->new_device(dev);
> + }
>   /* Otherwise we remove it. */
>   } else
>   if (file->fd == VIRTIO_DEV_STOPPED)
> 

-- 
Regards,
Haifeng

[dpdk-dev] vhost: virtio-net rx-ring stop work after work many hours, bug?

2015-01-27 Thread Linhaifeng

Hi,all

I use vhost-user to send data to VM at first it cant work well but after many 
hours VM can not receive data but can send data.

(gdb)p avail_idx
$4 = 2668
(gdb)p free_entries
$5 = 0
(gdb)l
/* check that we have enough buffers */
if (unlikely(count > free_entries))
count = free_entries;

if (count == 0){
int b=0;
if(b) { // when set b=1 to notify guest rx_ring will restart to work
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {

eventfd_write(vq->callfd, 1);
}
}
return 0;
}

some info i print in guest:

net eth3:vi->num=199
net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
net eth3:svq info: num_free=254, used->idx=1644, avail->idx=1644

net eth3:vi->num=199
net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
net eth3:svq info: num_free=254, used->idx=1645, avail->idx=1645

net eth3:vi->num=199
net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
net eth3:svq info: num_free=254, used->idx=1646, avail->idx=1646

# free
 total   used   free sharedbuffers cached
Mem:  3924100  3372523586848  0  95984 138060
-/+ buffers/cache: 1032083820892
Swap:   970748  0 970748

I have two questions:
1.Should we need to notify guest when there is no buffer in vq->avail?
2.Why virtio_net stop to fill avail?






-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer

2015-01-29 Thread linhaifeng

From: Linhaifeng 

If we found there is no buffer we should notify virtio_net to
fill buffers.

We use mz send buffers from VM to VM,found that the other VM
stop to receive data after many hours.

Signed-off-by: Linhaifeng 
---
 lib/librte_vhost/vhost_rxtx.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index ccfd82f..013c526 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -87,9 +87,14 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
/*check that we have enough buffers*/
if (unlikely(count > free_entries))
count = free_entries;
-
-   if (count == 0)
+   /* If there is no buffers we should notify guest to fill. 
+   * This is need when guest use virtio_net driver(not pmd).
+   */
+   if (count == 0) {
+   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
+   eventfd_write((int)vq->kickfd, 1);  
return 0;
+   }

res_end_idx = res_base_idx + count;
/* vq->last_used_idx_res is atomically updated. */
-- 
1.9.0

[dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer

2015-01-29 Thread Linhaifeng



On 2015/1/29 18:39, Xie, Huawei wrote:

>> -if (count == 0)
>> +/* If there is no buffers we should notify guest to fill.
>> +* This is need when guest use virtio_net driver(not pmd).
>> +*/
>> +if (count == 0) {
>> +if (!(vq->avail->flags &
>> VRING_AVAIL_F_NO_INTERRUPT))
>> +eventfd_write((int)vq->kickfd, 1);
>>  return 0;
>> +}
> 
> Haifeng:
> Is it the root cause and is it protocol required?
> Could you give a detailed description for that scenario?
> 

I use mz to send data from one VM1 to VM2.The two VM use virtio-net driver.
VM1 execute follow script:
for((i=0;i<9;i++));
do
mz eth0 -t udp -A 1.1.1.1 -B 1.1.1.2 -a 00:00:00:00:00:01 -b 00:00:00:00:00:02 
-c 1000 -p 512
sleep 4
done

VM2 execute follow command to watch:
watch -d ifconfig

After many hours VM2 stop to receive data.

Could you test it ?
-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer

2015-01-29 Thread Linhaifeng



On 2015/1/29 21:00, Xie, Huawei wrote:
> 
> 
>> -Original Message-----
>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>> Sent: Thursday, January 29, 2015 8:39 PM
>> To: Xie, Huawei; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer when 
>> there is
>> no buffer
>>
>>
>>
>> On 2015/1/29 18:39, Xie, Huawei wrote:
>>
>>>> -  if (count == 0)
>>>> +  /* If there is no buffers we should notify guest to fill.
>>>> +  * This is need when guest use virtio_net driver(not pmd).
>>>> +  */
>>>> +  if (count == 0) {
>>>> +  if (!(vq->avail->flags &
>>>> VRING_AVAIL_F_NO_INTERRUPT))
>>>> +  eventfd_write((int)vq->kickfd, 1);
>>>>return 0;
>>>> +  }
>>>
>>> Haifeng:
>>> Is it the root cause and is it protocol required?
>>> Could you give a detailed description for that scenario?
>>>
>>
>> I use mz to send data from one VM1 to VM2.The two VM use virtio-net driver.
>> VM1 execute follow script:
>> for((i=0;i<9;i++));
>> do
>> mz eth0 -t udp -A 1.1.1.1 -B 1.1.1.2 -a 00:00:00:00:00:01 -b 
>> 00:00:00:00:00:02 -c
>> 1000 -p 512
>> sleep 4
>> done
>>
>> VM2 execute follow command to watch:
>> watch -d ifconfig
>>
>> After many hours VM2 stop to receive data.
>>
>> Could you test it ?
> 
> 
> We could try next week after I send the whole patch. 
> How many hours? Is it reproducible at your side? I inject packets through 
> packet generator to guest for more than ten hours, haven't met issues. 

About three hours.
What kind of driver you used in guest?virtio-net-pmd or virtio-net?


> As I said in another mail sent  to you, could you dump the status of vring if 
> you still have the spot?

How to dump the status of vring in guest?

> Could you please also reply to that mail?
>

Which mail?


> For the patch, if we have no root cause, I prefer not to apply it, so that we 
> don't send more interrupts than needed to guest to affect performance.

I found that if we add this notify the performance is better(growth of 100kpps 
when use 64byte UDP packets)

> People could temporarily apply this patch as a work around.
> 
> Or anyone 
> 

OK.I'm also not sure about this bug.I think i should do something to found the 
real reason.

> 
>> --
>> Regards,
>> Haifeng
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [DISCUSSION] : ERROR while running vhost example in dpdk-1.8

2015-01-30 Thread Linhaifeng



On 2015/1/30 0:48, Srinivasreddy R wrote:
> EAL: 512 hugepages of size 2097152 reserved, but no mounted hugetlbfs found
> for that size

Maybe you haven't mount hugetlbfs.
-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer

2015-01-30 Thread Linhaifeng



On 2015/1/30 16:20, Xu, Qian Q wrote:
> Haifeng
> Could you give more information so that we can reproduce your issue? Thanks. 
> 1. What's your dpdk package, based on which branch, with Huawei's 
> vhost-user's patches? 
Not with Huawei's patches.I implement a demo before Huawei's patches with 
OVDK's vhost_dequeue_burst and vhost_enqueue_burst.

Now I'm trying to run vhost-user with dpdk vhost example(master branch).

> 2. What's your step and command to launch vhost sample? 
BTW.How to run vhost example with vm2vm mode?
Is VM2VM means i can send packet from vm1 to vm2?

I setup with follow steps but can't send packet in VM:
mount -t hugetlbfs nodev /mnt/huge -o pagesize=1G
mount -t hugetlbfs nodev /dev/hugepages -o pagesize=2M
echo 8192 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

modprobe uio
insmod ${RTE_SDK}/x86_64-native-linuxapp-gcc/kmod/igb_uio.ko
dpdk_nic_bind.py -b igb_uio 82:00.0 82:00.1

rmmod vhost_net
modprobe cuse
insmod ${RTE_SDK}/lib/librte_vhost/eventfd_link/eventfd_link.ko

${RTE_SDK}/examples/vhost/build/app/vhost-switch -c 0x300 -n 4 --huge-dir 
/mnt/huge -m 2048 -- -p 0x1 --vm2vm 1


qemu-wrap.py  -enable-kvm -mem-path /mnt/huge/  -mem-prealloc -smp 2 \
-netdev tap,id=hostnet1,vhost=on,ifname=port0 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=00:00:00:00:00:01 -hda 
/mnt/sdb/linhf/vm1.img -m 2048 -vnc :0

qemu-wrap.py  -enable-kvm -mem-path /mnt/huge/  -mem-prealloc -smp 2 \
-netdev tap,id=hostnet1,vhost=on,ifname=port0 -device 
virtio-net-pci,netdev=hostnet1,id=net1,mac=00:00:00:00:00:02 -hda 
/mnt/sdb/linhf/vm2.img -m 2048 -vnc :1




> 3. What is mz? Your internal tool? I can't yum install mz or download mz 
> tool. 
http://www.perihel.at/sec/mz/

> 4. As to your test scenario, I understand it in this way: virtio1 in VM1, 
> virtio2 in VM2, then let virtio1 send packages to virtio2, the problem is 
> that after 3 hours, virtio2 can't receive packets, but virtio1 is still 
> sending packets, am I right? So mz is like a packet generator to send 
> packets, right? 

Yes,you are right.

> 
> 
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Linhaifeng
> Sent: Thursday, January 29, 2015 9:51 PM
> To: Xie, Huawei; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there 
> is no buffer
> 
> 
> 
> On 2015/1/29 21:00, Xie, Huawei wrote:
>>
>>
>>> -Original Message-
>>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>>> Sent: Thursday, January 29, 2015 8:39 PM
>>> To: Xie, Huawei; dev at dpdk.org
>>> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer 
>>> when there is no buffer
>>>
>>>
>>>
>>> On 2015/1/29 18:39, Xie, Huawei wrote:
>>>
>>>>> - if (count == 0)
>>>>> + /* If there is no buffers we should notify guest to fill.
>>>>> + * This is need when guest use virtio_net driver(not pmd).
>>>>> + */
>>>>> + if (count == 0) {
>>>>> + if (!(vq->avail->flags &
>>>>> VRING_AVAIL_F_NO_INTERRUPT))
>>>>> + eventfd_write((int)vq->kickfd, 1);
>>>>>   return 0;
>>>>> + }
>>>>
>>>> Haifeng:
>>>> Is it the root cause and is it protocol required?
>>>> Could you give a detailed description for that scenario?
>>>>
>>>
>>> I use mz to send data from one VM1 to VM2.The two VM use virtio-net driver.
>>> VM1 execute follow script:
>>> for((i=0;i<9;i++));
>>> do
>>> mz eth0 -t udp -A 1.1.1.1 -B 1.1.1.2 -a 00:00:00:00:00:01 -b 
>>> 00:00:00:00:00:02 -c
>>> 1000 -p 512
>>> sleep 4
>>> done
>>>
>>> VM2 execute follow command to watch:
>>> watch -d ifconfig
>>>
>>> After many hours VM2 stop to receive data.
>>>
>>> Could you test it ?
>>
>>
>> We could try next week after I send the whole patch. 
>> How many hours? Is it reproducible at your side? I inject packets through 
>> packet generator to guest for more than ten hours, haven't met issues. 
> 
> About three hours.
> What kind of driver you used in guest?virtio-net-pmd or virtio-net?
> 
> 
>> As I said in another mail sent  to you, could you dump the status of vring 
>> if you still have the spot?
> 
> How to dump the status of vring in guest?
> 
>> Could you please also reply to that mail?
>>
> 
> Which mail?
> 
> 
>> For the patch, if we have no root cause, I prefer not to apply it, so that 
>> we don't send more interrupts than needed to guest to affect performance.
> 
> I found that if we add this notify the performance is better(growth of 
> 100kpps when use 64byte UDP packets)
> 
>> People could temporarily apply this patch as a work around.
>>
>> Or anyone
>>
> 
> OK.I'm also not sure about this bug.I think i should do something to found 
> the real reason.
> 
>>
>>> --
>>> Regards,
>>> Haifeng
>>
>>
>>
> 

-- 
Regards,
Haifeng

[dpdk-dev] [RFC PATCH v2 01/14] enable VIRTIO_NET_F_CTRL_RX VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ. Observed that virtio-net driver in guest would crash with only CTRL_RX enabled.

2015-01-30 Thread Linhaifeng



On 2015/1/26 11:20, Huawei Xie wrote:
> In virtnet_send_command:
> 
>   /* Caller should know better */
>   BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ) ||
>   (out + in > VIRTNET_SEND_COMMAND_SG_MAX));
> 
> Signed-off-by: Huawei Xie 
> ---
>  lib/librte_vhost/virtio-net.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> index b041849..52b4957 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -73,7 +73,8 @@ static struct virtio_net_config_ll *ll_root;
>  
>  /* Features supported by this lib. */
>  #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
> -   (1ULL << VIRTIO_NET_F_CTRL_RX))
> + (1ULL << VIRTIO_NET_F_CTRL_VQ) | \
> + (1ULL << VIRTIO_NET_F_CTRL_RX))
>  static uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
>  
>  /* Line size for reading maps file. */
> 

Hi,Xie

If don't have features VIRTIO_NET_F_CTRL_VQ and  VIRTIO_NET_F_CTRL_RX what 
would happen?
Why add the two features?
-- 
Regards,
Haifeng

[dpdk-dev] mmap failed: Cannot allocate memory when init dpdk eal

2015-01-30 Thread Linhaifeng



On 2015/1/30 19:40, zhangsha (A) wrote:
> Hi ?all
> 
> I am suffering from the problem mmap failed as followed when init dpdk eal.
> 
> Fri Jan 30 09:03:29 2015:EAL: Setting up memory...
> Fri Jan 30 09:03:34 2015:EAL: map_all_hugepages(): mmap failed: Cannot 
> allocate memory
> Fri Jan 30 09:03:34 2015:EAL: Failed to mmap 2 MB hugepages
> Fri Jan 30 09:03:34 2015:EAL: Cannot init memory
> 
> Before I run the demo, the free hugepages of my host is :
> 
> cat /proc/meminfo
> MemTotal:   132117056 kB
> MemFree:122040292 kB
> Buffers:   10984 kB
> Cached:   123056 kB
> SwapCached:0 kB
> Active:   120812 kB
> Inactive:  85860 kB
> Active(anon):  79488 kB
> Inactive(anon):  364 kB
> Active(file):  41324 kB
> Inactive(file):85496 kB
> Unevictable:   23576 kB
> Mlocked:   23576 kB
> SwapTotal: 0 kB
> SwapFree:  0 kB
> Dirty:  2576 kB
> Writeback: 0 kB
> AnonPages: 96236 kB
> Mapped:19936 kB
> Shmem:   552 kB
> Slab: 101344 kB
> SReclaimable:  24164 kB
> SUnreclaim:77180 kB
> KernelStack:2544 kB
> PageTables: 4180 kB
> NFS_Unstable:  0 kB
> Bounce:0 kB
> WritebackTmp:  0 kB
> CommitLimit:61864224 kB
> Committed_AS: 585844 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:  518656 kB
> VmallocChunk:   34292133264 kB
> HardwareCorrupted: 0 kB
> AnonHugePages:  4096 kB
> HugePages_Total:4096
> HugePages_Free: 4096
> HugePages_Rsvd:0
> HugePages_Surp:0
> Hugepagesize:   2048 kB
> DirectMap4k:   96256 kB
> DirectMap2M: 6178816 kB
> DirectMap1G:127926272 kB
> 
> And after the demo executed, I got the hugepages like this:
> 
> cat /proc/meminfo
> MemTotal:   132117056 kB
> MemFree:117325180 kB
> Buffers:   33508 kB
> Cached:   721912 kB
> SwapCached:0 kB
> Active:  4217712 kB
> Inactive: 540956 kB
> Active(anon):4019068 kB
> Inactive(anon):   121136 kB
> Active(file): 198644 kB
> Inactive(file):   419820 kB
> Unevictable:   23908 kB
> Mlocked:   23908 kB
> SwapTotal: 0 kB
> SwapFree:  0 kB
> Dirty:  2856 kB
> Writeback: 0 kB
> AnonPages:   4035184 kB
> Mapped:   160292 kB
> Shmem:122100 kB
> Slab: 177908 kB
> SReclaimable:  64808 kB
> SUnreclaim:   113100 kB
> KernelStack:7560 kB
> PageTables:62128 kB
> NFS_Unstable:  0 kB
> Bounce:0 kB
> WritebackTmp:  0 kB
> CommitLimit:61864224 kB
> Committed_AS:8789664 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:  527296 kB
> VmallocChunk:   34292122604 kB
> HardwareCorrupted: 0 kB
> AnonHugePages:262144 kB
> HugePages_Total:4096
> HugePages_Free: 2048
> HugePages_Rsvd:0
> HugePages_Surp:0
> Hugepagesize:   2048 kB
> DirectMap4k:  141312 kB
> DirectMap2M: 9279488 kB
> DirectMap1G:124780544 kB
> 
> Only the hugepages beyond to node1 was mapped. I was told host(having 64bit 
> OS) cannot allocate memory while node0 has 2048 free hugepages,why?
> Dose anyone encountered the similar problem ever?
> Any response will be appreciated!
> Thanks!
> 
> 
> 
> 

How do you tell kernel not to allocate memory on node0?

I guess node0 and node1 both have 2048 hugepages and you want to mmap 4096 
hugepages.
So you can mmap 2048 hugepages on node1.After this step you cannot mmap any 
hugepage
files because you tell kernel not to allocate memory on node0.


-- 
Regards,
Haifeng

[dpdk-dev] [DISCUSSION] : ERROR while running vhost example in dpdk-1.8

2015-01-31 Thread Linhaifeng

ring=0x7fb8f60f5d00
>> hw_ring=0x7fb8f5238580 dma_addr=0x36638580
>> PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60f3c00
>> hw_ring=0x7fb8f5248580 dma_addr=0x36648580
>> PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60f1b00
>> hw_ring=0x7fb8f5258580 dma_addr=0x36658580
>> PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60efa00
>> hw_ring=0x7fb8f5268580 dma_addr=0x36668580
>> PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60ed900
>> hw_ring=0x7fb8f5278580 dma_addr=0x36678580
>> PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60eb800
>> hw_ring=0x7fb8f5288580 dma_addr=0x36688580
>> PMD: eth_igb_rx_queue_setup(): sw_ring=0x7fb8f60e9700
>> hw_ring=0x7fb8f5298580 dma_addr=0x36698580
>> PMD: eth_igb_tx_queue_setup(): To improve 1G driver performance, consider
>> setting the TX WTHRESH value to 4, 8, or 16.
>> PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fb8f60e7600
>> hw_ring=0x7fb8f52a8580 dma_addr=0x366a8580
>> PMD: eth_igb_tx_queue_setup(): To improve 1G driver performance, consider
>> setting the TX WTHRESH value to 4, 8, or 16.
>> PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fb8f60e5500
>> hw_ring=0x7fb8f52b8580 dma_addr=0x366b8580
>> PMD: eth_igb_tx_queue_setup(): To improve 1G driver performance, consider
>> setting the TX WTHRESH value to 4, 8, or 16.
>> PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fb8f60e3400
>> hw_ring=0x7fb8f52c8580 dma_addr=0x366c8580
>> PMD: eth_igb_tx_queue_setup(): To improve 1G driver performance, consider
>> setting the TX WTHRESH value to 4, 8, or 16.
>> PMD: eth_igb_tx_queue_setup(): sw_ring=0x7fb8f60e1300
>> hw_ring=0x7fb8f52d8580 dma_addr=0x366d8580
>> PMD: eth_igb_start(): <<
>> VHOST_PORT: Max virtio devices supported: 8
>> VHOST_PORT: Port 0 MAC: 2c 53 4a 00 28 68
>> VHOST_DATA: Procesing on Core 1 started
>> VHOST_DATA: Procesing on Core 2 started
>> VHOST_DATA: Procesing on Core 3 started
>> Device statistics 
>> ==
>> VHOST_CONFIG: (0) Device configuration started
>> VHOST_CONFIG: (0) Failed to find memory file for pid 845
>> Device statistics 
>>
>>
>>
>> ./qemu-wrap.py -machine pc-i440fx-1.4,accel=kvm,usb=off -cpu host -smp
>> 2,sockets=2,cores=1,threads=1  -netdev tap,id=hostnet1,vhost=on -device
>> virtio-net-pci,netdev=hostnet1,id=net1  -hda /home/utils/images/vm1.img  -m
>> 2048  -vnc 0.0.0.0:2   -net nic -net tap,ifname=tap3,script=no -mem-path
>> /dev/hugepages -mem-prealloc
>> W: /etc/qemu-ifup: no bridge for guest interface found
>> file_ram_alloc: can't mmap RAM pages: Cannot allocate memory
>> qemu-system-x86_64: unable to start vhost net: 22: falling back on
>> userspace virtio
>>
>>
>>  mount  | grep huge
>> cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,relatime,hugetlb)
>> nodev on /dev/hugepages type hugetlbfs (rw)
>> nodev on /mnt/huge type hugetlbfs (rw,pagesize=2M)
>>
>>
>>
>>  cat /proc/meminfo
>> MemTotal:   16345340 kB
>> MemFree: 4591596 kB
>> Buffers:  466472 kB
>> Cached:  1218728 kB
>> SwapCached:0 kB
>> Active:  1147228 kB
>> Inactive: 762992 kB
>> Active(anon): 232732 kB
>> Inactive(anon):14760 kB
>> Active(file): 914496 kB
>> Inactive(file):   748232 kB
>> Unevictable:3704 kB
>> Mlocked:3704 kB
>> SwapTotal:  16686076 kB
>> SwapFree:   16686076 kB
>> Dirty:   488 kB
>> Writeback: 0 kB
>> AnonPages:230800 kB
>> Mapped:    55248 kB
>> Shmem: 17932 kB
>> Slab: 245116 kB
>> SReclaimable: 214372 kB
>> SUnreclaim:30744 kB
>> KernelStack:3664 kB
>> PageTables:13900 kB
>> NFS_Unstable:  0 kB
>> Bounce:0 kB
>> WritebackTmp:  0 kB
>> CommitLimit:20140152 kB
>> Committed_AS:1489760 kB
>> VmallocTotal:   34359738367 kB
>> VmallocUsed:  374048 kB
>> VmallocChunk:   34359356412 kB
>> HardwareCorrupted: 0 kB
>> AnonHugePages:106496 kB
>> HugePages_Total:   8
>> HugePages_Free:0
>> HugePages_Rsvd:0
>> HugePages_Surp:0
>> Hugepagesize:1048576 kB
>> DirectMap4k:   91600 kB
>> DirectMap2M: 2965504 kB
>> DirectMap1G:13631488 kB
>>
>>
>>
>> sysctl -A | grep huge
>> vm.hugepages_treat_as_movable = 0
>> vm.hugetlb_shm_group = 0
>> vm.nr_hugepages = 8
>> vm.nr_hugepages_mempolicy = 8
>> vm.nr_overcommit_hugepages = 0
>>
>>
>> thanks
>> Srinivas.
>>
>>
>>
>> On Fri, Jan 30, 2015 at 10:59 AM, Linhaifeng 
>> wrote:
>>
>>>
>>>
>>> On 2015/1/30 0:48, Srinivasreddy R wrote:
>>>> EAL: 512 hugepages of size 2097152 reserved, but no mounted hugetlbfs
>>> found
>>>> for that size
>>>
>>> Maybe you haven't mount hugetlbfs.
>>> --
>>> Regards,
>>> Haifeng
>>>
>>>
>>
>>
>> --
>> thanks
>> srinivas.
>>
> 
> 
> 

-- 
Regards,
Haifeng

Re: [dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms

2017-08-30 Thread linhaifeng

在 2016/1/18 11:05, Zhihong Wang 写道:
> This patch set optimizes DPDK memcpy for AVX512 platforms, to make full
> utilization of hardware resources and deliver high performance.
>
> In current DPDK, memcpy holds a large proportion of execution time in
> libs like Vhost, especially for large packets, and this patch can bring
> considerable benefits.
>
> The implementation is based on the current DPDK memcpy framework, some
> background introduction can be found in these threads:
> http://dpdk.org/ml/archives/dev/2014-November/008158.html
> http://dpdk.org/ml/archives/dev/2015-January/011800.html
>
> Code changes are:
>
>   1. Read CPUID to check if AVX512 is supported by CPU
>
>   2. Predefine AVX512 macro if AVX512 is enabled by compiler
>
>   3. Implement AVX512 memcpy and choose the right implementation based on
>  predefined macros
>
>   4. Decide alignment unit for memcpy perf test based on predefined macros
>
> --
> Changes in v2:
>
>   1. Tune performance for prior platforms
>
> Zhihong Wang (5):
>   lib/librte_eal: Identify AVX512 CPU flag
>   mk: Predefine AVX512 macro for compiler
>   lib/librte_eal: Optimize memcpy for AVX512 platforms
>   app/test: Adjust alignment unit for memcpy perf test
>   lib/librte_eal: Tune memcpy for prior platforms
>
>  app/test/test_memcpy_perf.c|   6 +
>  .../common/include/arch/x86/rte_cpuflags.h |   2 +
>  .../common/include/arch/x86/rte_memcpy.h   | 269 
> -
>  mk/rte.cpuflags.mk |   4 +
>  4 files changed, 268 insertions(+), 13 deletions(-)
>

Hi Zhihong Wang

I test avx512 rte_memcpy found the performanc for ovs dpdk is lower than avx2 
rte_memcpy.

The vm loop test for ovs dpdk results:
avx512 is *15*Gbps
perf data:
  0.52 │  vmovdq (%r8,%r10,1),%zmm0
 95.33 │  sub$0x40,%r9
  0.45 │  add$0x40,%r8
  0.60 │  vmovdq %zmm0,-0x40(%r8)
  1.84 │  cmp$0x3f,%r9
   │↓ ja f20
   │  lea-0x40(%rsi),%r8
  0.15 │  or $0xffc0,%rsi
  0.21 │  and$0xffc0,%r8
  0.00 │  lea0x40(%rsi,%r8,1),%rsi
  0.00 │  vmovdq (%rcx,%rsi,1),%zmm0
  0.22 │  vmovdq %zmm0,(%rdx,%rsi,1)
  0.67 │↓ jmpq   c78
   │  mov-0x128(%rbp),%rdi
   │  rex.R
   │  .byte  0x89
   │  popfq

avx2 is *18.8*Gbps
perf data:
  0.96 │  add%r9,%r13
 66.04 │  vmovdq (%rdx),%ymm0
  1.20 │  sub$0x40,%rdi
  1.53 │  add$0x40,%rdx
 10.83 │  vmovdq %ymm0,-0x40(%rdx,%r15,1)
  8.64 │  vmovdq -0x20(%rdx),%ymm0
  7.58 │  vmovdq %ymm0,-0x40(%rdx,%r13,1)


dpdk version: v17.05
ovs version: 2.8.90
qemu version: QEMU emulator version 2.9.94 (v2.10.0-rc4-dirty)

gcc version: gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
kernal version: 3.10.0


compile dpdk:
CONFIG_RTE_ENABLE_AVX512=y
export DPDK_DIR=$PWD
export DPDK_TARGET=x86_64-native-linuxapp-gcc
export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
make install T=$DPDK_TARGET DESTDIR=install

compile ovs:
sh boot.sh
./configure  CFLAGS="-g -O2" --with-dpdk=$DPDK_BUILD --prefix=/usr 
--localstatedir=/var --sysconfdir=/etc
make -j
make install

The test for dpdk test_memcpy_perf:
avx2：
** rte_memcpy() - memcpy perf. tests (C = compile-time constant) **
=== == == == ==
   Size Cache to cache   Cache to mem   Mem to cache Mem to mem
(bytes)(ticks)(ticks)(ticks)(ticks)
--- -- -- -- --
== 32B aligned 
 64   6 -   10  27 -   52  30 -   39  56 -   97
512  24 -   44 251 -  271 145 -  217 396 -  447
   1024  35 -   78 394 -  433 252 -  319 609 -  670
--- -- -- -- --
C64   3 -9  28 -   31  29 -   40  55 -   66
C   512  25 -   55 253 -  268 139 -  268 397 -  410
C  1024  32 -   83 394 -  416 250 -  396 612 -  687
=== Unaligned =
 64   8 -9  85 -   71  45 -   45 125 -  121
512  33 -   49 282 -  305 153 -  252 420 -  478
   1024  42 -   83 409 -  491 259 -  389 640 -  748
--- -- -- -- --
C64   4 -9  42 -   46  39 -   46  76 -   90
C   512  33 -   55 280 -  272 153 -  281 421 -  415
C  1024  41 -   83 407 -  427 258 -  405 578 -  701
=== == == == ==

avx512：
** rte_memcpy() - memcpy perf. tests (C = compile-time constant) **
=== == == == ==
   Size Cache to cache   Cache to mem   Mem to cache Mem to mem
(bytes)(ticks)(ticks)(ti

Re: [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic

2017-12-13 Thread linhaifeng

Hi,

What is the purpose of this patch? fix problem or improve performance?

在 2017/7/5 0:46, Declan Doherty 写道:
> From: Tomasz Kulasek 
>
> Add support for hardware flow classification of LACP control plane
> traffic to be redirect to a dedicated receive queue on each slave which
> is not visible to application. Also enables a dedicate transmit queue
> for LACP traffic which allows complete decoupling of control and data
> paths.

Re: [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP control traffic

2017-12-13 Thread Linhaifeng

Hi, Tomasz

Thanks for the reply!

I thought that the patch was used to resolve the "lacp loss" problem. We know 
when the traffic is large enough 
 bond may loss the lacp packets and slave would out of sync.

My question is, are there any solutions to solve the "lacp loss" problem?


-Original Message-
发件人: Kulasek, TomaszX [mailto:tomaszx.kula...@intel.com] 
发送时间: 2017年12月13日 20:42
收件人: Linhaifeng ; Doherty, Declan 
; dev@dpdk.org
主题: RE: [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues for LACP 
control traffic

Hi,

> -Original Message-
> From: linhaifeng [mailto:haifeng@huawei.com]
> Sent: Wednesday, December 13, 2017 09:16
> To: Doherty, Declan ; dev@dpdk.org
> Cc: Kulasek, TomaszX 
> Subject: Re: [dpdk-dev] [PATCH v3 3/4] net/bond: dedicated hw queues 
> for LACP control traffic
> 
> Hi,
> 
> What is the purpose of this patch? fix problem or improve performance?
> 
> 在 2017/7/5 0:46, Declan Doherty 写道:
> > From: Tomasz Kulasek 
> >
> > Add support for hardware flow classification of LACP control plane 
> > traffic to be redirect to a dedicated receive queue on each slave 
> > which is not visible to application. Also enables a dedicate 
> > transmit queue for LACP traffic which allows complete decoupling of 
> > control and data paths.
> 

This is performance improvement.

Tomasz

[dpdk-dev] vhost: virtio-net rx-ring stop work after work many hours, bug?

2015-02-03 Thread Linhaifeng

I found that the new code had try to notify guest after send each packet after 
2bbb811.
So this bug not exist now.

static inline uint32_t __attribute__((always_inline)) 
virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
struct rte_mbuf **pkts, uint32_t count) {
... ...

for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {

... ...

/* Kick the guest if necessary. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
eventfd_write((int)vq->kickfd, 1);
}

return count;
}

thank you very much!

On 2015/1/27 15:57, Linhaifeng wrote:
> Hi,all
> 
> I use vhost-user to send data to VM at first it cant work well but after many 
> hours VM can not receive data but can send data.
> 
> (gdb)p avail_idx
> $4 = 2668
> (gdb)p free_entries
> $5 = 0
> (gdb)l
> /* check that we have enough buffers */
> if (unlikely(count > free_entries))
> count = free_entries;
> 
> if (count == 0){
> int b=0;
> if(b) { // when set b=1 to notify guest rx_ring will restart to 
> work
> if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
> 
> eventfd_write(vq->callfd, 1);
> }
> }
> return 0;
> }
> 
> some info i print in guest:
> 
> net eth3:vi->num=199
> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
> net eth3:svq info: num_free=254, used->idx=1644, avail->idx=1644
> 
> net eth3:vi->num=199
> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
> net eth3:svq info: num_free=254, used->idx=1645, avail->idx=1645
> 
> net eth3:vi->num=199
> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
> net eth3:svq info: num_free=254, used->idx=1646, avail->idx=1646
> 
> # free
>  total   used   free sharedbuffers cached
> Mem:  3924100  3372523586848  0  95984 138060
> -/+ buffers/cache: 1032083820892
> Swap:   970748  0 970748
> 
> I have two questions:
> 1.Should we need to notify guest when there is no buffer in vq->avail?
> 2.Why virtio_net stop to fill avail?
> 
> 
> 
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] vhost: virtio-net rx-ring stop work after work many hours, bug?

2015-02-03 Thread Linhaifeng

On 2015/1/27 17:37, Michael S. Tsirkin wrote:
> On Tue, Jan 27, 2015 at 03:57:13PM +0800, Linhaifeng wrote:
>> Hi,all
>>
>> I use vhost-user to send data to VM at first it cant work well but after 
>> many hours VM can not receive data but can send data.
>>
>> (gdb)p avail_idx
>> $4 = 2668
>> (gdb)p free_entries
>> $5 = 0
>> (gdb)l
>> /* check that we have enough buffers */
>> if (unlikely(count > free_entries))
>> count = free_entries;
>>
>> if (count == 0){
>> int b=0;
>> if(b) { // when set b=1 to notify guest rx_ring will restart to 
>> work
>> if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
>>
>> eventfd_write(vq->callfd, 1);
>> }
>> }
>> return 0;
>> }
>>
>> some info i print in guest:
>>
>> net eth3:vi->num=199
>> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
>> net eth3:svq info: num_free=254, used->idx=1644, avail->idx=1644
>>
>> net eth3:vi->num=199
>> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
>> net eth3:svq info: num_free=254, used->idx=1645, avail->idx=1645
>>
>> net eth3:vi->num=199
>> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
>> net eth3:svq info: num_free=254, used->idx=1646, avail->idx=1646
>>
>> # free
>>  total   used   free sharedbuffers cached
>> Mem:  3924100  3372523586848  0  95984 138060
>> -/+ buffers/cache: 1032083820892
>> Swap:   970748  0 970748
>>
>> I have two questions:
>> 1.Should we need to notify guest when there is no buffer in vq->avail?
> 
> No unless NOTIFY_ON_EMPTY is set (most guests don't set it).

Thank you for your new knowledge:)

> 
>> 2.Why virtio_net stop to fill avail?
> 
> Most likely, it didn't get an interrupt.
> 
> If so, it would be a dpdk vhost user bug.
> Which code are you using in dpdk?
> 

Hi,mst

Thank you for your reply.
Sorry, maybe my mail filter have a bug,so i saw this mail until now.

I use the dpdk code before 2bbb811.I paste the code here for you to review.
(Note that the vhost_enqueue_burstand vhost_dequeue_burst function runs as poll 
mode.)

I guess if vhost_enqueue_burst used all the buffers in rx_ring then try to 
notify guest
to receive but at this time vcpu may be exiting so guest cann't receive the 
notify.

/*
 * Enqueues packets to the guest virtio RX virtqueue for vhost devices.
 */
static inline uint32_t __attribute__((always_inline))
vhost_enqueue_burst(struct virtio_net *dev, struct rte_mbuf **pkts, unsigned 
count)
{
struct vhost_virtqueue *vq;
struct vring_desc *desc;
struct rte_mbuf *buff;
/* The virtio_hdr is initialised to 0. */
struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0,0,0,0,0,0},0};
uint64_t buff_addr = 0;
uint64_t buff_hdr_addr = 0;
uint32_t head[PKT_BURST_SIZE], packet_len = 0;
uint32_t head_idx, packet_success = 0;
uint32_t mergeable, mrg_count = 0;
uint32_t retry = 0;
uint16_t avail_idx, res_cur_idx;
uint16_t res_base_idx, res_end_idx;
uint16_t free_entries;
uint8_t success = 0;

LOG_DEBUG(APP, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh);
vq = dev->virtqueue[VIRTIO_RXQ];
count = (count > PKT_BURST_SIZE) ? PKT_BURST_SIZE : count;

/* As many data cores may want access to available buffers, they need 
to be reserved. */
do {
res_base_idx = vq->last_used_idx_res;
avail_idx = *((volatile uint16_t *)&vq->avail->idx);

free_entries = (avail_idx - res_base_idx);
/* If retry is enabled and the queue is full then we wait and 
retry to avoid packet loss. */
if (unlikely(count > free_entries)) {
for (retry = 0; retry < burst_tx_retry_num; retry++) {
rte_delay_us(burst_tx_delay_time);
avail_idx =
*((volatile uint16_t *)&vq->avail->idx);
free_entries = (avail_idx - res_base_idx);
if (count <= free_entries)
break;
}
}

/*check that we have enough buffers*/
if (unlikely(count > free_entries))
c

[dpdk-dev] vhost: virtio-net rx-ring stop work after work many hours, bug?

2015-02-03 Thread Linhaifeng



On 2015/1/28 17:51, Xie, Huawei wrote:
> 
> 
>> -Original Message-----
>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>> Sent: Tuesday, January 27, 2015 3:57 PM
>> To: dpd >> dev at dpdk.org; ms >> Michael S. Tsirkin
>> Cc: lilijun; liuyongan at huawei.com; Xie, Huawei
>> Subject: vhost: virtio-net rx-ring stop work after work many hours,bug?
>>
>> Hi,all
>>
>> I use vhost-user to send data to VM at first it cant work well but after many
>> hours VM can not receive data but can send data.
>>
>> (gdb)p avail_idx
>> $4 = 2668
>> (gdb)p free_entries
>> $5 = 0
>> (gdb)l
>> /* check that we have enough buffers */
>> if (unlikely(count > free_entries))
>> count = free_entries;
>>
>> if (count == 0){
>> int b=0;
>> if(b) { // when set b=1 to notify guest rx_ring will restart to 
>> work
>> if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
>>
>> eventfd_write(vq->callfd, 1);
>> }
>> }
>> return 0;
>> }
>>
>> some info i print in guest:
>>
>> net eth3:vi->num=199
>> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
>> net eth3:svq info: num_free=254, used->idx=1644, avail->idx=1644
>>
>> net eth3:vi->num=199
>> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
>> net eth3:svq info: num_free=254, used->idx=1645, avail->idx=1645
>>
>> net eth3:vi->num=199
>> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
>> net eth3:svq info: num_free=254, used->idx=1646, avail->idx=1646
>>
>> # free
>>  total   used   free sharedbuffers cached
>> Mem:  3924100  3372523586848  0  95984 138060
>> -/+ buffers/cache: 1032083820892
>> Swap:   970748  0 970748
>>
>> I have two questions:
>> 1.Should we need to notify guest when there is no buffer in vq->avail?
>> 2.Why virtio_net stop to fill avail?
>>
>>
> 
> Haifeng:
> Thanks for reporting this issue.
> It might not be vhost-user specific, because as long vhost-user has received 
> all the vring information correctly, it shares the same code 
> receiving/transmitting packets with vhost-cuse.
> Are you using latest patch or the old patch?

Xie:
Sorry, I saw this mail until now.

I use the old code not latest patch.The lastest patch is ok because it will 
notify guest after copy each pkt when merge-able.(May be is not OK when you 
close the merge-able feature)

> 1  Do you disable merge-able feature support in vhost example? There is an 
> bug in vhost-user feature negotiation which is fixed in latest patch.  It 
> could cause guest not receive packets at all. So if you are testing only 
> using linux net device, this isn't the cause.
Yes, i disabled it.

> 2.Do you still have the spot? Could you check if there are available 
> descriptors from checking the desc ring or even dump the vring status? Check 
> the notify_on_empty flag Michael mentioned?  I find a bug in vhost library 
> when processing three or more chained descriptors. But if you never 
> re-configure eth0 with different features,  this isn't the cause.
> 3. Is this reproduce-able? Next time if you run long hours stability test, 
> could you try to disable guest virtio feature?
> -device 
> virtio-net-pci,netdev=mynet0,mac=54:00:00:54:00:01,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off
> 
> I have run more than ten hours' nightly test many times before, and haven't 
> met this issue. 
> We will check * if there is issue in the vhost code delivering interrupts to 
> guest which cause potential deadlock *if there are places we should but miss 
> delivering interrupts to guest.
> 
>>
>>
>>
>>
>> --
>> Regards,
>> Haifeng
> 

-- 
Regards,
Haifeng

[dpdk-dev] mmap fails with more than 40000 hugepages

2015-02-06 Thread Linhaifeng


On 2015/2/5 20:00, Damjan Marion (damarion) wrote:
> Hi,
> 
> I have system with 2 NUMA nodes and 256G RAM total. I noticed that DPDK 
> crashes in rte_eal_init()
> when number of available hugepages is around 4 or above.
> Everything works fine with lower values (i.e. 3).
> 
> I also tried with allocating 4 on node0 and 0 on node1, same crash 
> happens.
> 
> 
> Any idea what might be causing this?
> 
> Thanks,
> 
> Damjan
> 

Is there any other process trying to use hugepages?

> 
> $ cat 
> /sys/devices/system/node/node[01]/hugepages/hugepages-2048kB/nr_hugepages
> 2
> 2
> 
> $ grep -i huge /proc/meminfo
> AnonHugePages:706560 kB
> HugePages_Total:   4
> HugePages_Free:4
> HugePages_Rsvd:0
> HugePages_Surp:0
> Hugepagesize:   2048 kB
> 
> 
> $ sudo ~/src/dpdk/x86_64-native-linuxapp-gcc/app/testpmd -l 5-7 -n 3 
> --socket-mem 512,512
> EAL: Detected lcore 0 as core 0 on socket 0
> EAL: Detected lcore 1 as core 1 on socket 0
> EAL: Detected lcore 2 as core 2 on socket 0
> EAL: Detected lcore 3 as core 3 on socket 0
> EAL: Detected lcore 4 as core 4 on socket 0
> EAL: Detected lcore 5 as core 5 on socket 0
> EAL: Detected lcore 6 as core 6 on socket 0
> EAL: Detected lcore 7 as core 7 on socket 0
> EAL: Detected lcore 8 as core 0 on socket 1
> EAL: Detected lcore 9 as core 1 on socket 1
> EAL: Detected lcore 10 as core 2 on socket 1
> EAL: Detected lcore 11 as core 3 on socket 1
> EAL: Detected lcore 12 as core 4 on socket 1
> EAL: Detected lcore 13 as core 5 on socket 1
> EAL: Detected lcore 14 as core 6 on socket 1
> EAL: Detected lcore 15 as core 7 on socket 1
> EAL: Detected lcore 16 as core 0 on socket 0
> EAL: Detected lcore 17 as core 1 on socket 0
> EAL: Detected lcore 18 as core 2 on socket 0
> EAL: Detected lcore 19 as core 3 on socket 0
> EAL: Detected lcore 20 as core 4 on socket 0
> EAL: Detected lcore 21 as core 5 on socket 0
> EAL: Detected lcore 22 as core 6 on socket 0
> EAL: Detected lcore 23 as core 7 on socket 0
> EAL: Detected lcore 24 as core 0 on socket 1
> EAL: Detected lcore 25 as core 1 on socket 1
> EAL: Detected lcore 26 as core 2 on socket 1
> EAL: Detected lcore 27 as core 3 on socket 1
> EAL: Detected lcore 28 as core 4 on socket 1
> EAL: Detected lcore 29 as core 5 on socket 1
> EAL: Detected lcore 30 as core 6 on socket 1
> EAL: Detected lcore 31 as core 7 on socket 1
> EAL: Support maximum 128 logical core(s) by configuration.
> EAL: Detected 32 lcore(s)
> EAL: VFIO modules not all loaded, skip VFIO support...
> EAL: Setting up memory...
> EAL: Ask a virtual area of 0x80 bytes
> EAL: Virtual area found at 0x7fae2a20 (size = 0x80)
> EAL: Ask a virtual area of 0x760 bytes
> EAL: Virtual area found at 0x7fae22a0 (size = 0x760)
> EAL: Ask a virtual area of 0x140 bytes
> EAL: Virtual area found at 0x7fae2140 (size = 0x140)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fae2100 (size = 0x20)
> EAL: Ask a virtual area of 0x40 bytes
> EAL: Virtual area found at 0x7fae20a0 (size = 0x40)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fae2060 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fae2020 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fae1fe0 (size = 0x20)
> EAL: Ask a virtual area of 0x580 bytes
> EAL: Virtual area found at 0x7fae1a40 (size = 0x580)
> EAL: Ask a virtual area of 0x3b20 bytes
> EAL: Virtual area found at 0x7faddf00 (size = 0x3b20)
> EAL: Ask a virtual area of 0x40 bytes
> EAL: Virtual area found at 0x7faddea0 (size = 0x40)
> EAL: Ask a virtual area of 0x7c0 bytes
> EAL: Virtual area found at 0x7fadd6c0 (size = 0x7c0)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fadd680 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fadd640 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fadd600 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fadd5c0 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fadd580 (size = 0x20)
> EAL: Ask a virtual area of 0x980 bytes
> EAL: Virtual area found at 0x7fadcbe0 (size = 0x980)
> EAL: Ask a virtual area of 0x1900 bytes
> EAL: Virtual area found at 0x7fadb2c0 (size = 0x1900)
> EAL: Ask a virtual area of 0x2440 bytes
> EAL: Virtual area found at 0x7fad8e60 (size = 0x2440)
> EAL: Ask a virtual area of 0xc80 bytes
> EAL: Virtual area found at 0x7fad81c0 (size = 0xc80)
> EAL: Ask a virtual area of 0x3200 bytes
> EAL: Virtual area found at 0x7fad4fa0 (size = 0x3200)
> EAL: Ask a virtual area of 0x3db80 bytes
> EAL: Virtual area

[dpdk-dev] mmap fails with more than 40000 hugepages

2015-02-06 Thread Linhaifeng



On 2015/2/5 20:00, Damjan Marion (damarion) wrote:
> Hi,
> 
> I have system with 2 NUMA nodes and 256G RAM total. I noticed that DPDK 
> crashes in rte_eal_init()
> when number of available hugepages is around 4 or above.
> Everything works fine with lower values (i.e. 3).
> 
> I also tried with allocating 4 on node0 and 0 on node1, same crash 
> happens.
> 
> 
> Any idea what might be causing this?
> 
> Thanks,
> 
> Damjan
> 

cat /proc/sys/vm/max_map_count


> 
> $ cat 
> /sys/devices/system/node/node[01]/hugepages/hugepages-2048kB/nr_hugepages
> 2
> 2
> 
> $ grep -i huge /proc/meminfo
> AnonHugePages:706560 kB
> HugePages_Total:   4
> HugePages_Free:4
> HugePages_Rsvd:0
> HugePages_Surp:0
> Hugepagesize:   2048 kB
> 
> 
> $ sudo ~/src/dpdk/x86_64-native-linuxapp-gcc/app/testpmd -l 5-7 -n 3 
> --socket-mem 512,512
> EAL: Detected lcore 0 as core 0 on socket 0
> EAL: Detected lcore 1 as core 1 on socket 0
> EAL: Detected lcore 2 as core 2 on socket 0
> EAL: Detected lcore 3 as core 3 on socket 0
> EAL: Detected lcore 4 as core 4 on socket 0
> EAL: Detected lcore 5 as core 5 on socket 0
> EAL: Detected lcore 6 as core 6 on socket 0
> EAL: Detected lcore 7 as core 7 on socket 0
> EAL: Detected lcore 8 as core 0 on socket 1
> EAL: Detected lcore 9 as core 1 on socket 1
> EAL: Detected lcore 10 as core 2 on socket 1
> EAL: Detected lcore 11 as core 3 on socket 1
> EAL: Detected lcore 12 as core 4 on socket 1
> EAL: Detected lcore 13 as core 5 on socket 1
> EAL: Detected lcore 14 as core 6 on socket 1
> EAL: Detected lcore 15 as core 7 on socket 1
> EAL: Detected lcore 16 as core 0 on socket 0
> EAL: Detected lcore 17 as core 1 on socket 0
> EAL: Detected lcore 18 as core 2 on socket 0
> EAL: Detected lcore 19 as core 3 on socket 0
> EAL: Detected lcore 20 as core 4 on socket 0
> EAL: Detected lcore 21 as core 5 on socket 0
> EAL: Detected lcore 22 as core 6 on socket 0
> EAL: Detected lcore 23 as core 7 on socket 0
> EAL: Detected lcore 24 as core 0 on socket 1
> EAL: Detected lcore 25 as core 1 on socket 1
> EAL: Detected lcore 26 as core 2 on socket 1
> EAL: Detected lcore 27 as core 3 on socket 1
> EAL: Detected lcore 28 as core 4 on socket 1
> EAL: Detected lcore 29 as core 5 on socket 1
> EAL: Detected lcore 30 as core 6 on socket 1
> EAL: Detected lcore 31 as core 7 on socket 1
> EAL: Support maximum 128 logical core(s) by configuration.
> EAL: Detected 32 lcore(s)
> EAL: VFIO modules not all loaded, skip VFIO support...
> EAL: Setting up memory...
> EAL: Ask a virtual area of 0x80 bytes
> EAL: Virtual area found at 0x7fae2a20 (size = 0x80)
> EAL: Ask a virtual area of 0x760 bytes
> EAL: Virtual area found at 0x7fae22a0 (size = 0x760)
> EAL: Ask a virtual area of 0x140 bytes
> EAL: Virtual area found at 0x7fae2140 (size = 0x140)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fae2100 (size = 0x20)
> EAL: Ask a virtual area of 0x40 bytes
> EAL: Virtual area found at 0x7fae20a0 (size = 0x40)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fae2060 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fae2020 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fae1fe0 (size = 0x20)
> EAL: Ask a virtual area of 0x580 bytes
> EAL: Virtual area found at 0x7fae1a40 (size = 0x580)
> EAL: Ask a virtual area of 0x3b20 bytes
> EAL: Virtual area found at 0x7faddf00 (size = 0x3b20)
> EAL: Ask a virtual area of 0x40 bytes
> EAL: Virtual area found at 0x7faddea0 (size = 0x40)
> EAL: Ask a virtual area of 0x7c0 bytes
> EAL: Virtual area found at 0x7fadd6c0 (size = 0x7c0)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fadd680 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fadd640 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fadd600 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fadd5c0 (size = 0x20)
> EAL: Ask a virtual area of 0x20 bytes
> EAL: Virtual area found at 0x7fadd580 (size = 0x20)
> EAL: Ask a virtual area of 0x980 bytes
> EAL: Virtual area found at 0x7fadcbe0 (size = 0x980)
> EAL: Ask a virtual area of 0x1900 bytes
> EAL: Virtual area found at 0x7fadb2c0 (size = 0x1900)
> EAL: Ask a virtual area of 0x2440 bytes
> EAL: Virtual area found at 0x7fad8e60 (size = 0x2440)
> EAL: Ask a virtual area of 0xc80 bytes
> EAL: Virtual area found at 0x7fad81c0 (size = 0xc80)
> EAL: Ask a virtual area of 0x3200 bytes
> EAL: Virtual area found at 0x7fad4fa0 (size = 0x3200)
> EAL: Ask a virtual area of 0x3db80 bytes
> EAL: Virtual area found at 0x7fa9740

[dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer

2015-02-06 Thread Linhaifeng



On 2015/2/4 9:38, Xu, Qian Q wrote:
> 4. Launch the VM1 and VM2 with virtio device, note: you need use qemu 
> version>2.1 to enable the vhost-user server's feature. Old qemu such as 
> 1.5,1.6 didn't support it.
> Below is my VM1 startup command, for your reference, similar for VM2. 
> /home/qemu-2.2.0/x86_64-softmmu/qemu-system-x86_64 -name us-vhost-vm1 -cpu 
> host -enable-kvm -m 2048 -object 
> memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on -numa 
> node,memdev=mem -mem-prealloc -smp 2 -drive file=/home/img/dpdk1-vm1.img 
> -chardev socket,id=char0,path=/home/dpdk-vhost/vhost-net -netdev 
> type=vhost-user,id=mynet1,chardev=char0,vhostforce -device 
> virtio-net-pci,mac=00:00:00:00:00:01, -nographic
> 
> 5. Then in the VM, you can have the same operations as before, send packet 
> from virtio1 to virtio2. 
> 
> Pls let me know if any questions, issues. 

Hi xie & xu

When I try to start VM vhost-switch crashed.

VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
VHOST_CONFIG: mapped region 0 fd:19 to 0x sz:0xa off:0x0
VHOST_CONFIG: mmap qemu guest failed.
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
run_dpdk_vhost.sh: line 19:  1854 Segmentation fault  
${RTE_SDK}/examples/vhost/build/app/vhost-switch -c 0x300 -n 4 --huge-dir 
/dev/hugepages -m 2048 -- -p 0x1 --vm2vm 2 --mergeable 0 --zero-copy 0



-- 
Regards,
Haifeng

[dpdk-dev] ixgbe:about latency

2015-02-06 Thread Linhaifeng

Hi,

I used l2fwd to test ixgbe PMD's latency (packet length is 64 bytes)
found an interesting thing that latency is about 22us when tx bits rate is 4M
and latency is 103us when tx bits rate is 5M.

Who can tell me why?Is it a bug?

Thank you very much!

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer

2015-02-06 Thread Linhaifeng

28580
PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are 
satisfied. Rx Burst Bulk Alloc function will be used on port=0, queue=0.
PMD: ixgbe_dev_rx_queue_setup(): Vector rx enabled, please make sure RX burst 
size no less than 32.
... ...
PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x7f39579ca040 hw_ring=0x7f39a2eb3b80 
dma_addr=0xf2b6b3b80
PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are 
satisfied. Rx Burst Bulk Alloc function will be used on port=0, queue=127.
PMD: ixgbe_dev_rx_queue_setup(): Vector rx enabled, please make sure RX burst 
size no less than 32.
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f39579c7f00 hw_ring=0x7f39a2ec3c00 
dma_addr=0xf2b6c3c00
PMD: set_tx_function(): Using full-featured tx code path
PMD: set_tx_function():  - txq_flags = e01 [IXGBE_SIMPLE_FLAGS=f01]
PMD: set_tx_function():  - tx_rs_thresh = 32 [RTE_PMD_IXGBE_TX_MAX_BURST=32]
PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x7f39579c5dc0 hw_ring=0x7f39a2ed3c00 
dma_addr=0xf2b6d3c00
PMD: set_tx_function(): Using full-featured tx code path
PMD: set_tx_function():  - txq_flags = e01 [IXGBE_SIMPLE_FLAGS=f01]
PMD: set_tx_function():  - tx_rs_thresh = 32 [RTE_PMD_IXGBE_TX_MAX_BURST=32]
VHOST_PORT: Max virtio devices supported: 64
VHOST_PORT: Port 0 MAC: 00 1b 21 69 f7 c8
VHOST_PORT: Skipping disabled port 1
VHOST_DATA: Procesing on Core 9 started
VHOST_CONFIG: socket created, fd:15
VHOST_CONFIG: bind to vhost-net
VHOST_CONFIG: new virtio connection is 16
VHOST_CONFIG: new device, handle is 0
VHOST_CONFIG: new virtio connection is 17
VHOST_CONFIG: new device, handle is 1
VHOST_CONFIG: read message VHOST_USER_SET_OWNER
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:18
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:19
VHOST_CONFIG: read message VHOST_USER_SET_OWNER
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:20
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:21
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:22
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:18
VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
VHOST_CONFIG: mapped region 0 fd:19 to 0x sz:0xa off:0x0
VHOST_CONFIG: mmap qemu guest failed.
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
./run_dpdk_vhost.sh: line 19: 20796 Segmentation fault  
${RTE_SDK}/examples/vhost/build/app/vhost-switch -c 0x300 -n 4 --huge-dir 
/dev/hugepages -m 2048 -- -p 0x1 --vm2vm 2 --mergeable 0 --zero-copy 0

> 
> -Original Message-
> From: Linhaifeng [mailto:haifeng.lin at huawei.com] 
> Sent: Friday, February 06, 2015 12:02 PM
> To: Xu, Qian Q; Xie, Huawei
> Cc: lilijun; liuyongan at huawei.com; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there 
> is no buffer
> 
> 
> 
> On 2015/2/4 9:38, Xu, Qian Q wrote:
>> 4. Launch the VM1 and VM2 with virtio device, note: you need use qemu 
>> version>2.1 to enable the vhost-user server's feature. Old qemu such as 
>> 1.5,1.6 didn't support it.
>> Below is my VM1 startup command, for your reference, similar for VM2. 
>> /home/qemu-2.2.0/x86_64-softmmu/qemu-system-x86_64 -name us-vhost-vm1 -cpu 
>> host -enable-kvm -m 2048 -object 
>> memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on -numa 
>> node,memdev=mem -mem-prealloc -smp 2 -drive file=/home/img/dpdk1-vm1.img 
>> -chardev socket,id=char0,path=/home/dpdk-vhost/vhost-net -netdev 
>> type=vhost-user,id=mynet1,chardev=char0,vhostforce -device 
>> virtio-net-pci,mac=00:00:00:00:00:01, -nographic
>>
>> 5. Then in the VM, you can have the same operations as before, send packet 
>> from virtio1 to virtio2. 
>>
>> Pls let me know if any questions, issues. 
> 
> Hi xie & xu
> 
> When I try to start VM vhost-switch crashed.
> 
> VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
> VHOST_CONFIG: mapped region 0 fd:19 to 0x sz:0xa off:0x0
> VHOST_CONFIG: mmap qemu guest failed.
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
> run_dpdk_vhost.sh: line 19:  1854 Segmentation fault  
> ${RTE_SDK}/examples/vhost/build/app/vhost-switch -c 0x300 -n 4 --huge-dir 
> /dev/hugepages -m 2048 -- -p 0x1 --vm2vm 2 --mergeable 0 --zero-copy 0
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer

2015-02-07 Thread Linhaifeng



On 2015/2/6 13:54, Xu, Qian Q wrote:
> Haifeng
> Are you using the latest dpdk branch with vhost-user patches? I have never 
> met the issue.
> When is the vhost sample crashed? When you start VM or when you run sth in 
> VM? Is your qemu 2.2? How about your memory info? Could you give more details 
> about your steps? 
> 
> 

I have knew why you never met the issue.Because vhost-switch will notify guest 
after send every packets(performance is not every well).

static inline int __attribute__((always_inline))
virtio_tx_local(struct vhost_dev *vdev, struct rte_mbuf *m)
{
...
ret = rte_vhost_enqueue_burst(tdev, VIRTIO_RXQ, &m, 1/*you cant try to 
fill with rx_count*/);   
..

}

> 
> -Original Message-
> From: Linhaifeng [mailto:haifeng.lin at huawei.com] 
> Sent: Friday, February 06, 2015 12:02 PM
> To: Xu, Qian Q; Xie, Huawei
> Cc: lilijun; liuyongan at huawei.com; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there 
> is no buffer
> 
> 
> 
> On 2015/2/4 9:38, Xu, Qian Q wrote:
>> 4. Launch the VM1 and VM2 with virtio device, note: you need use qemu 
>> version>2.1 to enable the vhost-user server's feature. Old qemu such as 
>> 1.5,1.6 didn't support it.
>> Below is my VM1 startup command, for your reference, similar for VM2. 
>> /home/qemu-2.2.0/x86_64-softmmu/qemu-system-x86_64 -name us-vhost-vm1 -cpu 
>> host -enable-kvm -m 2048 -object 
>> memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on -numa 
>> node,memdev=mem -mem-prealloc -smp 2 -drive file=/home/img/dpdk1-vm1.img 
>> -chardev socket,id=char0,path=/home/dpdk-vhost/vhost-net -netdev 
>> type=vhost-user,id=mynet1,chardev=char0,vhostforce -device 
>> virtio-net-pci,mac=00:00:00:00:00:01, -nographic
>>
>> 5. Then in the VM, you can have the same operations as before, send packet 
>> from virtio1 to virtio2. 
>>
>> Pls let me know if any questions, issues. 
> 
> Hi xie & xu
> 
> When I try to start VM vhost-switch crashed.
> 
> VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
> VHOST_CONFIG: mapped region 0 fd:19 to 0x sz:0xa off:0x0
> VHOST_CONFIG: mmap qemu guest failed.
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
> VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
> run_dpdk_vhost.sh: line 19:  1854 Segmentation fault  
> ${RTE_SDK}/examples/vhost/build/app/vhost-switch -c 0x300 -n 4 --huge-dir 
> /dev/hugepages -m 2048 -- -p 0x1 --vm2vm 2 --mergeable 0 --zero-copy 0
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there is no buffer

2015-02-09 Thread Linhaifeng

On 2015/2/9 10:57, Xu, Qian Q wrote:
> Haifeng, 
> No matter mergeable =0 or 1, I have not met the issue that the vhost-user 
> crash when start VM. Have u changed the code? As you said below, vhost-switch 
> will notify guest after sending every packet, yes, it's the current code, and 
> Huawei, Xie will plan to optimize it in future. Is the crash caused by 
> changing code or any other step? 
> What do you want for the vhost-user, changing the notification mechanism? 
> Thx. By the way, sth means something. 
> 

Yes,I have modify the code for compile errors(I replace it with memset(&msgh, 
0, sizeof msgh)).

The issue is failed to mmap(memory size not align to hugepage's size).I guess 
this is qemu's bug.

In file included from 
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/virtio-net.c:34:
/usr/include/linux/vhost.h:33: error: expected specifier-qualifier-list before 
?pid_t?
== Build lib/librte_port
cc1: warnings being treated as errors
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-?.c: 
In function ?read_fd_message?:
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:141:
 error: missing initializer
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:141:
 error: (near initialization for ?msgh.msg_namelen?)
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:
 In function ?send_fd_message?:
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:213:
 error: missing initializer
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:213:
 error: (near initialization for ?msgh.msg_namelen?)
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:
 In function ?vserver_new_vq_conn?:
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:276:
 error: missing initializer
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c:276:
 error: (near initialization for ?vdev_ctx.fh?)
make[5]: *** [vhost_user/vhost-net-user.o] Error 1
make[5]: *** Waiting for unfinished jobs
cc1: warnings being treated as errors
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/virtio-net-user.c:
 In function ?user_set_mem_table?:
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/virtio-net-user.c:104:
 error: missing initializer
/mnt/sdc/linhf/dpdk-vhost-user/dpdk/lib/librte_vhost/vhost_user/virtio-net-user.c:104:
 error: (near initialization for ?tmp[0].mapped_address?)

> -Original Message-
> From: Linhaifeng [mailto:haifeng.lin at huawei.com] 
> Sent: Saturday, February 07, 2015 12:27 PM
> To: Xu, Qian Q; Xie, Huawei
> Cc: lilijun; liuyongan at huawei.com; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer when there 
> is no buffer
> 
> 
> 
> On 2015/2/6 13:54, Xu, Qian Q wrote:
>> Haifeng
>> Are you using the latest dpdk branch with vhost-user patches? I have never 
>> met the issue.
>> When is the vhost sample crashed? When you start VM or when you run sth in 
>> VM? Is your qemu 2.2? How about your memory info? Could you give more 
>> details about your steps? 
>>
>>
> 
> I have knew why you never met the issue.Because vhost-switch will notify 
> guest after send every packets(performance is not every well).
> 
> static inline int __attribute__((always_inline)) virtio_tx_local(struct 
> vhost_dev *vdev, struct rte_mbuf *m) {
>   ...
>   ret = rte_vhost_enqueue_burst(tdev, VIRTIO_RXQ, &m, 1/*you cant try to 
> fill with rx_count*/);   
>   ..
> 
> }
> 
>>
>> -Original Message-
>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>> Sent: Friday, February 06, 2015 12:02 PM
>> To: Xu, Qian Q; Xie, Huawei
>> Cc: lilijun; liuyongan at huawei.com; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] vhost: notify guest to fill buffer 
>> when there is no buffer
>>
>>
>>
>> On 2015/2/4 9:38, Xu, Qian Q wrote:
>>> 4. Launch the VM1 and VM2 with virtio device, note: you need use qemu 
>>> version>2.1 to enable the vhost-user server's feature. Old qemu such as 
>>> 1.5,1.6 didn't support it.
>>> Below is my VM1 startup command, for your reference, similar for VM2. 
>>> /home/qemu-2.2.0/x86_64-softmmu/qemu-system-x86_64 -name us-vhost-vm1 
>>> -cpu host -enable-kvm -m 2048 -object 
>>> memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on 
>>> -numa node,memdev=mem -mem-prealloc -smp 2 -drive 
>>> file=/home/img/dpdk1-vm1.img -chardev 
>>> socket,id=char0,path=/home/dpdk-vhost/vhost-net -netdev 
>>> type=vhost-user,id=mynet1

[dpdk-dev] [RFC PATCH v2 00/14] qemu vhost-user support

2015-02-09 Thread Linhaifeng

Hi, Xie

Is librte_vhost support openvswitch?
How to attach the vhost_device_ctx to the port of openvswitch?

On 2015/1/26 11:20, Huawei Xie wrote:
> v2 changes:
>  make fdset num field reflect the current number of fds vhost server manages
>  allocate context for connected fd in vserver_new_vq_conn
>  enable multiple socket support
>  get_feature fix: apply Tetsuya's comment
>  set_feature fix
>  close received log fd, err fd: apply Haifeng's comment 
>  CTRL_VQ fix
>  set ifname to unix domain socket path
>  change the context type from uint64_t to void * in event management
>  other code rework
> 
> Huawei Xie (14):
>   turn on VIRTIO_NET_F_CTRL_RX is dependant on VIRTIO_NET_F_CTRL_VQ.
>   create vhost_cuse directory
>   rename vhost-net-cdev.h to vhost-net.h
>   consistent print style
>   implement the eventfd copying(from fd in qemu process to fd in vhost 
> process) into vhost-net-cdev.c
>   copy host_memory_map from virtio-net.c to a new file virtio-net-cdev.c
>   host_memory_map
>   split set_memory_table into two parts
>   add select based event driven fd management logic
>   vhost user support
>   vhost user memory region map
>   cleanup when vhost user connection is closed
>   multiple socket support
>   vhost user ifr_name support
> 
>  lib/librte_vhost/Makefile |   5 +-
>  lib/librte_vhost/rte_virtio_net.h |   5 +-
>  lib/librte_vhost/vhost-net-cdev.c | 389 --
>  lib/librte_vhost/vhost-net-cdev.h | 113 ---
>  lib/librte_vhost/vhost-net.h  | 121 +++
>  lib/librte_vhost/vhost_cuse/eventfd_copy.c|  89 +
>  lib/librte_vhost/vhost_cuse/eventfd_copy.h|  40 +++
>  lib/librte_vhost/vhost_cuse/vhost-net-cdev.c  | 414 +++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.c | 401 ++
>  lib/librte_vhost/vhost_cuse/virtio-net-cdev.h |  48 +++
>  lib/librte_vhost/vhost_rxtx.c |   5 +-
>  lib/librte_vhost/vhost_user/fd_man.c  | 207 
>  lib/librte_vhost/vhost_user/fd_man.h  |  64 
>  lib/librte_vhost/vhost_user/vhost-net-user.c  | 462 
> ++
>  lib/librte_vhost/vhost_user/vhost-net-user.h  | 106 ++
>  lib/librte_vhost/vhost_user/virtio-net-user.c | 322 ++
>  lib/librte_vhost/vhost_user/virtio-net-user.h |  49 +++
>  lib/librte_vhost/virtio-net.c | 455 +++--
>  lib/librte_vhost/virtio-net.h |  43 +++
>  19 files changed, 2419 insertions(+), 919 deletions(-)
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.c
>  delete mode 100644 lib/librte_vhost/vhost-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost-net.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/eventfd_copy.h
>  create mode 100644 lib/librte_vhost/vhost_cuse/vhost-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.c
>  create mode 100644 lib/librte_vhost/vhost_cuse/virtio-net-cdev.h
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.c
>  create mode 100644 lib/librte_vhost/vhost_user/fd_man.h
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/vhost-net-user.h
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.c
>  create mode 100644 lib/librte_vhost/vhost_user/virtio-net-user.h
>  create mode 100644 lib/librte_vhost/virtio-net.h
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v2 09/11] lib/librte_vhost: vhost user support

2015-02-12 Thread Linhaifeng



On 2015/2/12 13:07, Huawei Xie wrote:
> +
> + /* This is ugly */
> + mapped_size = memory.regions[idx].memory_size +
> + memory.regions[idx].mmap_offset;
> + mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
> + mapped_size,
> + PROT_READ | PROT_WRITE, MAP_SHARED,
> + pmsg->fds[idx],
> + 0);

Just another ugly way:
We can use the size of file to mmap then unmmap is not need align to the size 
of page.

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v2 09/11] lib/librte_vhost: vhost user support

2015-02-12 Thread Linhaifeng



On 2015/2/12 17:28, Xie, Huawei wrote:
> On 2/12/2015 4:28 PM, Linhaifeng wrote:
>>
>> On 2015/2/12 13:07, Huawei Xie wrote:
>>> +
>>> +   /* This is ugly */
>>> +   mapped_size = memory.regions[idx].memory_size +
>>> +   memory.regions[idx].mmap_offset;
>>> +   mapped_address = (uint64_t)(uintptr_t)mmap(NULL,
>>> +   mapped_size,
>>> +   PROT_READ | PROT_WRITE, MAP_SHARED,
>>> +   pmsg->fds[idx],
>>> +   0);
>> Just another ugly way:
>> We can use the size of file to mmap then unmmap is not need align to the 
>> size of page.
>>
> Yes, this is like how cuses handle mmap.
> We will add this into the to-do list, combine all the regions the first,
> check if they belong to the same file, and then map each file once.
> Seems there is no elegant way.
> 
> There is another to do for mmap. If there are multiple virtio devices,
> the memory are mapped for each virtio device. Actually we only need once.
> 

Great minds think alike.

The graceful way is qemu send a message to us to mmap whit which file and size 
then
we not need to mmap for each virtio device.

> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] lib/librte_vhost:fix can't send packet anymore after mempool is full again

2015-03-20 Thread linhaifeng

From: Linhaifeng 

When failed to malloc buffer from mempool we just update last_used_idx but
not used->idx so after many times vhost thought have handle all packets
but virtio_net thought vhost have not handle all packets and will not
update avail->idx.

Signed-off-by: Linhaifeng 
---
 lib/librte_vhost/vhost_rxtx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 535c7a1..93a8fff 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -609,7 +609,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
if (unlikely(m == NULL)) {
RTE_LOG(ERR, VHOST_DATA,
"Failed to allocate memory for mbuf.\n");
-   return entry_success;
+   goto finish;
}
seg_offset = 0;
seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM;
@@ -721,6 +721,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
entry_success++;
}

+finish:
rte_compiler_barrier();
vq->used->idx += entry_success;
/* Kick guest if required. */
-- 
1.8.5.2.msysgit.0

[dpdk-dev] [PATCH] lib/librte_vhost:fix can't send packet anymore after mempool is full again

2015-03-20 Thread Linhaifeng



On 2015/3/20 11:54, linhaifeng wrote:
> From: Linhaifeng 
> 
> When failed to malloc buffer from mempool we just update last_used_idx but
> not used->idx so after many times vhost thought have handle all packets
> but virtio_net thought vhost have not handle all packets and will not
> update avail->idx.
> 
> Signed-off-by: Linhaifeng 
> ---
>  lib/librte_vhost/vhost_rxtx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 535c7a1..93a8fff 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -609,7 +609,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
> queue_id,
>   if (unlikely(m == NULL)) {
>   RTE_LOG(ERR, VHOST_DATA,
>   "Failed to allocate memory for mbuf.\n");
> - return entry_success;
> + goto finish;

or use 'break' replace of 'goto' ?

>   }
>   seg_offset = 0;
>   seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM;
> @@ -721,6 +721,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
> queue_id,
>   entry_success++;
>   }
>  
> +finish:
>   rte_compiler_barrier();
>   vq->used->idx += entry_success;
>   /* Kick guest if required. */
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v2] lib/librte_vhost:fix can't send packet anymore after mempool is full again

2015-03-20 Thread linhaifeng

From: Linhaifeng 

When failed to malloc buffer from mempool we just update last_used_idx but
not used->idx so after many times vhost thought have handle all packets
but virtio_net thought vhost have not handle all packets and will not
update avail->idx.

Signed-off-by: Linhaifeng 
---
 lib/librte_vhost/vhost_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 535c7a1..510ffe8 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -609,7 +609,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
if (unlikely(m == NULL)) {
RTE_LOG(ERR, VHOST_DATA,
"Failed to allocate memory for mbuf.\n");
-   return entry_success;
+   break;  
}
seg_offset = 0;
seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM;
-- 
1.8.5.2.msysgit.0

[dpdk-dev] [PATCH] lib/librte_pmd_virtio fix can't receive packets after rx_q is empty If failed to alloc mbuf ring_size times the rx_q may be empty and can't receive any packets forever because nb_us

2015-03-20 Thread linhaifeng

From: Linhaifeng 

so we should try to refill when nb_used is 0.After otherone free mbuf
we can restart to receive packets.

Signed-off-by: Linhaifeng 
---
 lib/librte_pmd_virtio/virtio_rxtx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index 1d74b34..5c7e0cd 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -495,7 +495,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
num = num - ((rxvq->vq_used_cons_idx + num) % 
DESC_PER_CACHELINE);

if (num == 0)
-   return 0;
+   goto refill;

num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
@@ -536,6 +536,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)

rxvq->packets += nb_rx;

+refill:
/* Allocate new mbuf for the used descriptor */
error = ENOSPC;
while (likely(!virtqueue_full(rxvq))) {
-- 
1.8.5.2.msysgit.0

[dpdk-dev] [PATCH] lib/librte_pmd_virtio fix can't receive packets after rx_q is empty If failed to alloc mbuf ring_size times the rx_q may be empty and can't receive any packets forever because nb_us

2015-03-20 Thread Linhaifeng

Sorry for my wrong title. Please ignore it.

On 2015/3/20 17:10, linhaifeng wrote:
> From: Linhaifeng 
> 
> so we should try to refill when nb_used is 0.After otherone free mbuf
> we can restart to receive packets.
> 
> Signed-off-by: Linhaifeng 
> ---
>  lib/librte_pmd_virtio/virtio_rxtx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
> b/lib/librte_pmd_virtio/virtio_rxtx.c
> index 1d74b34..5c7e0cd 100644
> --- a/lib/librte_pmd_virtio/virtio_rxtx.c
> +++ b/lib/librte_pmd_virtio/virtio_rxtx.c
> @@ -495,7 +495,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf 
> **rx_pkts, uint16_t nb_pkts)
>   num = num - ((rxvq->vq_used_cons_idx + num) % 
> DESC_PER_CACHELINE);
>  
>   if (num == 0)
> - return 0;
> + goto refill;
>  
>   num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
>   PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
> @@ -536,6 +536,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf 
> **rx_pkts, uint16_t nb_pkts)
>  
>   rxvq->packets += nb_rx;
>  
> +refill:
>   /* Allocate new mbuf for the used descriptor */
>   error = ENOSPC;
>   while (likely(!virtqueue_full(rxvq))) {
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] lib/librte_pmd_virtio fix can't receive packets after rx_q is empty

2015-03-20 Thread linhaifeng

From: Linhaifeng 

If failed to alloc mbuf ring_size times the rx_q may be empty and can't
receive any packets forever because nb_used is 0 forever.

so we should try to refill when nb_used is 0.After otherone free mbuf
we can restart to receive packets.

Signed-off-by: Linhaifeng 
---
 lib/librte_pmd_virtio/virtio_rxtx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
b/lib/librte_pmd_virtio/virtio_rxtx.c
index 1d74b34..5c7e0cd 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -495,7 +495,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)
num = num - ((rxvq->vq_used_cons_idx + num) % 
DESC_PER_CACHELINE);

if (num == 0)
-   return 0;
+   goto refill;

num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
@@ -536,6 +536,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, 
uint16_t nb_pkts)

rxvq->packets += nb_rx;

+refill:
/* Allocate new mbuf for the used descriptor */
error = ENOSPC;
while (likely(!virtqueue_full(rxvq))) {
-- 
1.8.5.2.msysgit.0

[dpdk-dev] [PATCH] lib/librte_pmd_virtio fix can't receive packets after rx_q is empty

2015-03-21 Thread Linhaifeng



On 2015/3/21 0:54, Xie, Huawei wrote:
> On 3/20/2015 6:47 PM, linhaifeng wrote:
>> From: Linhaifeng 
>>
>> If failed to alloc mbuf ring_size times the rx_q may be empty and can't
>> receive any packets forever because nb_used is 0 forever.
> Agreed. In current implementation, once VQ becomes empty, we have no
> chance to refill it again.
> The simple fix is, receive one and then refill one as other PMDs. Need
> to consider which is best strategy in terms of performance in future.
> How did you find this? through code review or real workload?

We found this through real workload which use vhost_net + virtio_pmd to forward 
packets in VM.

>> so we should try to refill when nb_used is 0.After otherone free mbuf
>> we can restart to receive packets.
>>
>> Signed-off-by: Linhaifeng 
>> ---
>>  lib/librte_pmd_virtio/virtio_rxtx.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
>> b/lib/librte_pmd_virtio/virtio_rxtx.c
>> index 1d74b34..5c7e0cd 100644
>> --- a/lib/librte_pmd_virtio/virtio_rxtx.c
>> +++ b/lib/librte_pmd_virtio/virtio_rxtx.c
>> @@ -495,7 +495,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf 
>> **rx_pkts, uint16_t nb_pkts)
>>  num = num - ((rxvq->vq_used_cons_idx + num) % 
>> DESC_PER_CACHELINE);
>>  
>>  if (num == 0)
>> -return 0;
>> +goto refill;
>>  
>>  num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
>>  PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
>> @@ -536,6 +536,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf 
>> **rx_pkts, uint16_t nb_pkts)
>>  
>>  rxvq->packets += nb_rx;
>>  
>> +refill:
>>  /* Allocate new mbuf for the used descriptor */
>>  error = ENOSPC;
>>  while (likely(!virtqueue_full(rxvq))) {
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v3] lib/librte_vhost: update used->idx when allocation of mbuf fails

2015-03-21 Thread linhaifeng

From: Linhaifeng 

When failed to malloc buffer from mempool we just update last_used_idx but
not used->idx so after many times vhost thought have handle all packets
but virtio_net thought vhost have not handle all packets and will not
update avail->idx.

Signed-off-by: Linhaifeng 
---
 lib/librte_vhost/vhost_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 535c7a1..510ffe8 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -609,7 +609,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
if (unlikely(m == NULL)) {
RTE_LOG(ERR, VHOST_DATA,
"Failed to allocate memory for mbuf.\n");
-   return entry_success;
+   break;  
}
seg_offset = 0;
seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM;
-- 
1.8.5.2.msysgit.0

[dpdk-dev] [PATCH] lib/librte_pmd_virtio fix can't receive packets after rx_q is empty

2015-03-21 Thread Linhaifeng



On 2015/3/21 0:54, Xie, Huawei wrote:
> On 3/20/2015 6:47 PM, linhaifeng wrote:
>> From: Linhaifeng 
>>
>> If failed to alloc mbuf ring_size times the rx_q may be empty and can't
>> receive any packets forever because nb_used is 0 forever.
> Agreed. In current implementation, once VQ becomes empty, we have no
> chance to refill it again.
> The simple fix is, receive one and then refill one as other PMDs. Need

"Receive one and then refill one" also have this problem.If refill also
failed the VQ would be empty too.

> to consider which is best strategy in terms of performance in future.
> How did you find this? through code review or real workload?
>> so we should try to refill when nb_used is 0.After otherone free mbuf
>> we can restart to receive packets.
>>
>> Signed-off-by: Linhaifeng 
>> ---
>>  lib/librte_pmd_virtio/virtio_rxtx.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c 
>> b/lib/librte_pmd_virtio/virtio_rxtx.c
>> index 1d74b34..5c7e0cd 100644
>> --- a/lib/librte_pmd_virtio/virtio_rxtx.c
>> +++ b/lib/librte_pmd_virtio/virtio_rxtx.c
>> @@ -495,7 +495,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf 
>> **rx_pkts, uint16_t nb_pkts)
>>  num = num - ((rxvq->vq_used_cons_idx + num) % 
>> DESC_PER_CACHELINE);
>>  
>>  if (num == 0)
>> -return 0;
>> +goto refill;
>>  
>>  num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
>>  PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
>> @@ -536,6 +536,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf 
>> **rx_pkts, uint16_t nb_pkts)
>>  
>>  rxvq->packets += nb_rx;
>>  
>> +refill:
>>  /* Allocate new mbuf for the used descriptor */
>>  error = ENOSPC;
>>  while (likely(!virtqueue_full(rxvq))) {
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] lib/librte_vhost:fix can't send packet anymore after mempool is full again

2015-03-21 Thread Linhaifeng

Hi, changchun & xie

I have modify the path with your suggestions.Please review.

Thank you.

On 2015/3/20 15:28, Ouyang, Changchun wrote:
> 
> 
>> -Original Message-
>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>> Sent: Friday, March 20, 2015 2:36 PM
>> To: dev at dpdk.org
>> Cc: Ouyang, Changchun; Xie, Huawei
>> Subject: Re: [PATCH] lib/librte_vhost:fix can't send packet anymore after
>> mempool is full again
>>
>>
>>
>> On 2015/3/20 11:54, linhaifeng wrote:
>>> From: Linhaifeng 
>>>
>>> When failed to malloc buffer from mempool we just update last_used_idx
>>> but not used->idx so after many times vhost thought have handle all
>>> packets but virtio_net thought vhost have not handle all packets and
>>> will not update avail->idx.
>>>
>>> Signed-off-by: Linhaifeng 
>>> ---
>>>  lib/librte_vhost/vhost_rxtx.c | 3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_vhost/vhost_rxtx.c
>>> b/lib/librte_vhost/vhost_rxtx.c index 535c7a1..93a8fff 100644
>>> --- a/lib/librte_vhost/vhost_rxtx.c
>>> +++ b/lib/librte_vhost/vhost_rxtx.c
>>> @@ -609,7 +609,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
>> uint16_t queue_id,
>>> if (unlikely(m == NULL)) {
>>> RTE_LOG(ERR, VHOST_DATA,
>>> "Failed to allocate memory for mbuf.\n");
>>> -   return entry_success;
>>> +   goto finish;
>>
>> or use 'break' replace of 'goto' ?
> 
> Make sense, I can review if you make a v2 patch
> Thanks
> Changchun
> 
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] cast used->idx to volatile

2015-03-21 Thread linhaifeng

From: Linhaifeng 

Same as rte_vhost_enqueue_burst we should cast used->idx
to volatile before notify guest.

Signed-off-by: Linhaifeng 
---
 lib/librte_vhost/vhost_rxtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 535c7a1..8d674d1 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
queue_id,
}

rte_compiler_barrier();
-   vq->used->idx += entry_success;
+   *(volatile uint16_t *)&vq->used->idx += entry_success;
/* Kick guest if required. */
if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
eventfd_write((int)vq->callfd, 1);
-- 
1.8.5.2.msysgit.0

[dpdk-dev] [PATCH v3] lib/librte_vhost: update used->idx when allocation of mbuf fails

2015-03-21 Thread Linhaifeng

cc changchun.ouyang at intel.com
cc huawei.xie at intel.com

On 2015/3/21 9:47, linhaifeng wrote:
> From: Linhaifeng 
> 
> When failed to malloc buffer from mempool we just update last_used_idx but
> not used->idx so after many times vhost thought have handle all packets
> but virtio_net thought vhost have not handle all packets and will not
> update avail->idx.
> 
> Signed-off-by: Linhaifeng 
> ---
>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 535c7a1..510ffe8 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -609,7 +609,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
> queue_id,
>   if (unlikely(m == NULL)) {
>   RTE_LOG(ERR, VHOST_DATA,
>   "Failed to allocate memory for mbuf.\n");
> - return entry_success;
> + break;  
>   }
>   seg_offset = 0;
>   seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM;
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] cast used->idx to volatile

2015-03-23 Thread Linhaifeng

cc changchun.ouyang at intel.com
cc huawei.xie at intel.com

On 2015/3/21 16:07, linhaifeng wrote:
> From: Linhaifeng 
> 
> Same as rte_vhost_enqueue_burst we should cast used->idx
> to volatile before notify guest.
> 
> Signed-off-by: Linhaifeng 
> ---
>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 535c7a1..8d674d1 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
> queue_id,
>   }
>  
>   rte_compiler_barrier();
> - vq->used->idx += entry_success;
> + *(volatile uint16_t *)&vq->used->idx += entry_success;
>   /* Kick guest if required. */
>   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>   eventfd_write((int)vq->callfd, 1);
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] cast used->idx to volatile

2015-03-23 Thread Linhaifeng



On 2015/3/21 16:07, linhaifeng wrote:
> From: Linhaifeng 
> 
> Same as rte_vhost_enqueue_burst we should cast used->idx
> to volatile before notify guest.
> 
> Signed-off-by: Linhaifeng 
> ---
>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 535c7a1..8d674d1 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t 
> queue_id,
>   }
>  
>   rte_compiler_barrier();
> - vq->used->idx += entry_success;
> + *(volatile uint16_t *)&vq->used->idx += entry_success;
>   /* Kick guest if required. */
>   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>   eventfd_write((int)vq->callfd, 1);
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] cast used->idx to volatile

2015-03-24 Thread Linhaifeng



On 2015/3/23 20:54, Xie, Huawei wrote:
> 
> 
>> -Original Message-----
>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>> Sent: Monday, March 23, 2015 8:24 PM
>> To: dev at dpdk.org
>> Cc: Ouyang, Changchun; Xie, Huawei
>> Subject: Re: [dpdk-dev] [PATCH] cast used->idx to volatile
>>
>>
>>
>> On 2015/3/21 16:07, linhaifeng wrote:
>>> From: Linhaifeng 
>>>
>>> Same as rte_vhost_enqueue_burst we should cast used->idx
>>> to volatile before notify guest.
>>>
>>> Signed-off-by: Linhaifeng 
>>> ---
>>>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
>>> index 535c7a1..8d674d1 100644
>>> --- a/lib/librte_vhost/vhost_rxtx.c
>>> +++ b/lib/librte_vhost/vhost_rxtx.c
>>> @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
>> uint16_t queue_id,
>>> }
>>>
>>> rte_compiler_barrier();
>>> -   vq->used->idx += entry_success;
>>> +   *(volatile uint16_t *)&vq->used->idx += entry_success;
> 
> 
> Haifeng:
> We have compiler barrier before and an external function call behind, so we 
> don't need volatile  here.
> Do you meet issue?
> 

Tx_q is sometimes stopped when we use virtio_net. Because vhost thought there 
are no buffers in tx_q and virtio_net
though vhost haven't handle all packets so we have to restart VM to restore 
work.

The status in VM is:
Mar 18 17:11:10 linux-b2ij kernel: [46337.246687] net eth7: virtnet_poll
Mar 18 17:11:10 linux-b2ij kernel: [46337.246690] net eth7: receive_buf
Mar 18 17:11:10 linux-b2ij kernel: [46337.246693] net eth7: vi->num=239
Mar 18 17:11:10 linux-b2ij kernel: [46337.246695] net eth7: 
svq:avail->idx=52939 used->idx=52939 num_free=18 num_added=0 
svq->last_used_idx=52820
Mar 18 17:11:10 linux-b2ij kernel: [46337.246699] net eth7: 
rvq:avail->idx=36215 used->idx=35977 num_free=18 num_added=0 
rvq->last_used_idx=35977
Mar 18 17:11:11 linux-b2ij kernel: [46337.901038] net eth7: dev_queue_xmit, 
qdisc->flags=4, qdisc->state deactiveed=0
Mar 18 17:11:12 linux-b2ij kernel: [46337.901042] net eth7: dev_queue_xmit, 
txq->state=1, stopped=1

Why compiler barrier not take effect in our case? Is compiler barrier depended 
on -O3 option? We use -O2 option.

>>> /* Kick guest if required. */
>>> if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>>> eventfd_write((int)vq->callfd, 1);
>>>
>>
>> --
>> Regards,
>> Haifeng
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] cast used->idx to volatile

2015-03-24 Thread Linhaifeng



On 2015/3/24 9:53, Xie, Huawei wrote:
> On 3/24/2015 9:00 AM, Linhaifeng wrote:
>>
>> On 2015/3/23 20:54, Xie, Huawei wrote:
>>>
>>>> -----Original Message-
>>>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>>>> Sent: Monday, March 23, 2015 8:24 PM
>>>> To: dev at dpdk.org
>>>> Cc: Ouyang, Changchun; Xie, Huawei
>>>> Subject: Re: [dpdk-dev] [PATCH] cast used->idx to volatile
>>>>
>>>>
>>>>
>>>> On 2015/3/21 16:07, linhaifeng wrote:
>>>>> From: Linhaifeng 
>>>>>
>>>>> Same as rte_vhost_enqueue_burst we should cast used->idx
>>>>> to volatile before notify guest.
>>>>>
>>>>> Signed-off-by: Linhaifeng 
>>>>> ---
>>>>>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
>>>>> index 535c7a1..8d674d1 100644
>>>>> --- a/lib/librte_vhost/vhost_rxtx.c
>>>>> +++ b/lib/librte_vhost/vhost_rxtx.c
>>>>> @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
>>>> uint16_t queue_id,
>>>>>   }
>>>>>
>>>>>   rte_compiler_barrier();
>>>>> - vq->used->idx += entry_success;
>>>>> + *(volatile uint16_t *)&vq->used->idx += entry_success;
>>>
>>> Haifeng:
>>> We have compiler barrier before and an external function call behind, so we 
>>> don't need volatile  here.
>>> Do you meet issue?
>>>
>> Tx_q is sometimes stopped when we use virtio_net. Because vhost thought 
>> there are no buffers in tx_q and virtio_net
>> though vhost haven't handle all packets so we have to restart VM to restore 
>> work.
>>
>> The status in VM is:
>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246687] net eth7: virtnet_poll
>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246690] net eth7: receive_buf
>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246693] net eth7: vi->num=239
>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246695] net eth7: 
>> svq:avail->idx=52939 used->idx=52939 num_free=18 num_added=0 
>> svq->last_used_idx=52820
>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246699] net eth7: 
>> rvq:avail->idx=36215 used->idx=35977 num_free=18 num_added=0 
>> rvq->last_used_idx=35977
>> Mar 18 17:11:11 linux-b2ij kernel: [46337.901038] net eth7: dev_queue_xmit, 
>> qdisc->flags=4, qdisc->state deactiveed=0
>> Mar 18 17:11:12 linux-b2ij kernel: [46337.901042] net eth7: dev_queue_xmit, 
>> txq->state=1, stopped=1
>>
>> Why compiler barrier not take effect in our case? Is compiler barrier 
>> depended on -O3 option? We use -O2 option.
> compiler barrier always works regardless of the optimization option.
> I don't get your story, but the key thing is, do you check the asm code?
> If called from outside as an API, how is it possible it is optimized?
> there is only one update to used->idx in that function.


Do you mean rte_vhost_enqueue_burst also not need cast used->idx to volatile ? 
Why not remove it?

>>
>>>>>   /* Kick guest if required. */
>>>>>   if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>>>>>   eventfd_write((int)vq->callfd, 1);
>>>>>
>>>> --
>>>> Regards,
>>>> Haifeng
>>>
>>>
> 
> 
> .
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v3] lib/librte_vhost: update used->idx when allocation of mbuf fails

2015-03-24 Thread Linhaifeng



On 2015/3/24 15:14, Xie, Huawei wrote:
> On 3/22/2015 8:08 PM, Ouyang, Changchun wrote:
>>
>>> -Original Message-
>>> From: linhaifeng [mailto:haifeng.lin at huawei.com]
>>> Sent: Saturday, March 21, 2015 9:47 AM
>>> To: dev at dpdk.org
>>> Cc: Ouyang, Changchun; Xie, Huawei
>>> Subject: [PATCH v3] lib/librte_vhost: update used->idx when allocation of
>>> mbuf fails
>>>
>>> From: Linhaifeng 
>>>
>>> When failed to malloc buffer from mempool we just update last_used_idx
>>> but not used->idx so after many times vhost thought have handle all packets
>>> but virtio_net thought vhost have not handle all packets and will not update
>>> avail->idx.
>>>
>>> Signed-off-by: Linhaifeng 
>> Acked-by: Changchun Ouyang 
>>
>>
>>
> Acked-by: Huawei Xie 
> 
> This patch fix the issue.
> Simple solution like other PMDs is before processing one descriptor,
> ensure allocation of new mbuf is successfull, and then immediately
> refill after receiving the packet from the descriptor.
> In future, we should consider optimized bulk allocation strategy with
> threshold.
> 

Hi, huawei

THis is patch is for librte_vhost.
Do you want to ack for the other patch for virtio-net-pmd?
> 
> 
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] cast used->idx to volatile

2015-03-24 Thread Linhaifeng



On 2015/3/24 18:06, Xie, Huawei wrote:
> On 3/24/2015 3:44 PM, Linhaifeng wrote:
>>
>> On 2015/3/24 9:53, Xie, Huawei wrote:
>>> On 3/24/2015 9:00 AM, Linhaifeng wrote:
>>>> On 2015/3/23 20:54, Xie, Huawei wrote:
>>>>>> -Original Message-
>>>>>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>>>>>> Sent: Monday, March 23, 2015 8:24 PM
>>>>>> To: dev at dpdk.org
>>>>>> Cc: Ouyang, Changchun; Xie, Huawei
>>>>>> Subject: Re: [dpdk-dev] [PATCH] cast used->idx to volatile
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2015/3/21 16:07, linhaifeng wrote:
>>>>>>> From: Linhaifeng 
>>>>>>>
>>>>>>> Same as rte_vhost_enqueue_burst we should cast used->idx
>>>>>>> to volatile before notify guest.
>>>>>>>
>>>>>>> Signed-off-by: Linhaifeng 
>>>>>>> ---
>>>>>>>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/lib/librte_vhost/vhost_rxtx.c 
>>>>>>> b/lib/librte_vhost/vhost_rxtx.c
>>>>>>> index 535c7a1..8d674d1 100644
>>>>>>> --- a/lib/librte_vhost/vhost_rxtx.c
>>>>>>> +++ b/lib/librte_vhost/vhost_rxtx.c
>>>>>>> @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
>>>>>> uint16_t queue_id,
>>>>>>> }
>>>>>>>
>>>>>>> rte_compiler_barrier();
>>>>>>> -   vq->used->idx += entry_success;
>>>>>>> +   *(volatile uint16_t *)&vq->used->idx += entry_success;
>>>>> Haifeng:
>>>>> We have compiler barrier before and an external function call behind, so 
>>>>> we don't need volatile  here.
>>>>> Do you meet issue?
>>>>>
>>>> Tx_q is sometimes stopped when we use virtio_net. Because vhost thought 
>>>> there are no buffers in tx_q and virtio_net
>>>> though vhost haven't handle all packets so we have to restart VM to 
>>>> restore work.
>>>>
>>>> The status in VM is:
>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246687] net eth7: virtnet_poll
>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246690] net eth7: receive_buf
>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246693] net eth7: vi->num=239
>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246695] net eth7: 
>>>> svq:avail->idx=52939 used->idx=52939 num_free=18 num_added=0 
>>>> svq->last_used_idx=52820
>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246699] net eth7: 
>>>> rvq:avail->idx=36215 used->idx=35977 num_free=18 num_added=0 
>>>> rvq->last_used_idx=35977
>>>> Mar 18 17:11:11 linux-b2ij kernel: [46337.901038] net eth7: 
>>>> dev_queue_xmit, qdisc->flags=4, qdisc->state deactiveed=0
>>>> Mar 18 17:11:12 linux-b2ij kernel: [46337.901042] net eth7: 
>>>> dev_queue_xmit, txq->state=1, stopped=1
>>>>
>>>> Why compiler barrier not take effect in our case? Is compiler barrier 
>>>> depended on -O3 option? We use -O2 option.
>>> compiler barrier always works regardless of the optimization option.
>>> I don't get your story, but the key thing is, do you check the asm code?
>>> If called from outside as an API, how is it possible it is optimized?
>>> there is only one update to used->idx in that function.
>>
>> Do you mean rte_vhost_enqueue_burst also not need cast used->idx to volatile 
>> ? Why not remove it?
> I checked the code. Seems we can remove. That is another issue.
> For your issue, you meet problem, and submit this this patch, but i am a
> bit confused it is the root cause. Do you check the asm code that
> volatile is optimized?

I'm not sure about this too.How to check volatile is optimized?

> 
>>>>>>> /* Kick guest if required. */
>>>>>>> if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>>>>>>> eventfd_write((int)vq->callfd, 1);
>>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Haifeng
>>>>>
>>>
>>> .
>>>
> 
> 
> .
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] vhost: Fix Segmentation fault of NULL address

2015-03-26 Thread Linhaifeng



On 2015/3/26 15:58, Qiu, Michael wrote:
> On 3/26/2015 3:52 PM, Xie, Huawei wrote:
>> On 3/26/2015 3:05 PM, Qiu, Michael wrote:
>>> Function gpa_to_vva() could return zero, while this will lead
>>> a Segmentation fault.
>>>
>>> This patch is to fix this issue.
>>>
>>> Signed-off-by: Michael Qiu 
>>> ---
>>>  lib/librte_vhost/vhost_rxtx.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
>>> index 535c7a1..23c8acb 100644
>>> --- a/lib/librte_vhost/vhost_rxtx.c
>>> +++ b/lib/librte_vhost/vhost_rxtx.c
>>> @@ -587,6 +587,9 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, 
>>> uint16_t queue_id,
>>>  
>>> /* Buffer address translation. */
>>> vb_addr = gpa_to_vva(dev, desc->addr);
>>> +   if (!vb_addr)
>>> +   return entry_success;
>>> +
>> Firstly we should add check for all gpa_to_vva translation, and do
>> reporting and cleanup on error. We should avoid the case that some buggy
>> or malicious guest virtio driver gives us an invalid GPA(for example,
>> GPA for some MMIO space) and crash our vhost process.
> 
> Yes, agree, I will do this for next version.
> 
>> As we discuss, you meet segfault here, but our virtio PMD shouldn't give
>> us the GPA that has no translation, so we should root cause first and
> 
> Yes, root cause is very important, but it will spend lots time, and I
> think we could be possible to apply this first(All check version).
> 

How to deal with invalid address but not NULL?

> Thanks,
> Michael
>> fix the problem, and then submit the patch checking all gpa_to_vva
>> translation.
>>
>> -Huawei
>>> /* Prefetch buffer address. */
>>> rte_prefetch0((void *)(uintptr_t)vb_addr);
>>>  
>>
> 
> 
>

[dpdk-dev] [PATCH] cast used->idx to volatile

2015-03-30 Thread Linhaifeng



On 2015/3/24 18:06, Xie, Huawei wrote:
> On 3/24/2015 3:44 PM, Linhaifeng wrote:
>>
>> On 2015/3/24 9:53, Xie, Huawei wrote:
>>> On 3/24/2015 9:00 AM, Linhaifeng wrote:
>>>> On 2015/3/23 20:54, Xie, Huawei wrote:
>>>>>> -Original Message-
>>>>>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>>>>>> Sent: Monday, March 23, 2015 8:24 PM
>>>>>> To: dev at dpdk.org
>>>>>> Cc: Ouyang, Changchun; Xie, Huawei
>>>>>> Subject: Re: [dpdk-dev] [PATCH] cast used->idx to volatile
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2015/3/21 16:07, linhaifeng wrote:
>>>>>>> From: Linhaifeng 
>>>>>>>
>>>>>>> Same as rte_vhost_enqueue_burst we should cast used->idx
>>>>>>> to volatile before notify guest.
>>>>>>>
>>>>>>> Signed-off-by: Linhaifeng 
>>>>>>> ---
>>>>>>>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/lib/librte_vhost/vhost_rxtx.c 
>>>>>>> b/lib/librte_vhost/vhost_rxtx.c
>>>>>>> index 535c7a1..8d674d1 100644
>>>>>>> --- a/lib/librte_vhost/vhost_rxtx.c
>>>>>>> +++ b/lib/librte_vhost/vhost_rxtx.c
>>>>>>> @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
>>>>>> uint16_t queue_id,
>>>>>>> }
>>>>>>>
>>>>>>> rte_compiler_barrier();
>>>>>>> -   vq->used->idx += entry_success;
>>>>>>> +   *(volatile uint16_t *)&vq->used->idx += entry_success;
>>>>> Haifeng:
>>>>> We have compiler barrier before and an external function call behind, so 
>>>>> we don't need volatile  here.
>>>>> Do you meet issue?
>>>>>
>>>> Tx_q is sometimes stopped when we use virtio_net. Because vhost thought 
>>>> there are no buffers in tx_q and virtio_net
>>>> though vhost haven't handle all packets so we have to restart VM to 
>>>> restore work.
>>>>
>>>> The status in VM is:
>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246687] net eth7: virtnet_poll
>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246690] net eth7: receive_buf
>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246693] net eth7: vi->num=239
>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246695] net eth7: 
>>>> svq:avail->idx=52939 used->idx=52939 num_free=18 num_added=0 
>>>> svq->last_used_idx=52820
>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246699] net eth7: 
>>>> rvq:avail->idx=36215 used->idx=35977 num_free=18 num_added=0 
>>>> rvq->last_used_idx=35977
>>>> Mar 18 17:11:11 linux-b2ij kernel: [46337.901038] net eth7: 
>>>> dev_queue_xmit, qdisc->flags=4, qdisc->state deactiveed=0
>>>> Mar 18 17:11:12 linux-b2ij kernel: [46337.901042] net eth7: 
>>>> dev_queue_xmit, txq->state=1, stopped=1
>>>>
>>>> Why compiler barrier not take effect in our case? Is compiler barrier 
>>>> depended on -O3 option? We use -O2 option.
>>> compiler barrier always works regardless of the optimization option.
>>> I don't get your story, but the key thing is, do you check the asm code?
>>> If called from outside as an API, how is it possible it is optimized?
>>> there is only one update to used->idx in that function.
>>
>> Do you mean rte_vhost_enqueue_burst also not need cast used->idx to volatile 
>> ? Why not remove it?
> I checked the code. Seems we can remove. That is another issue.
> For your issue, you meet problem, and submit this this patch, but i am a
> bit confused it is the root cause. Do you check the asm code that
> volatile is optimized?
> 

I had wrote a demo try to find out the different between rte_compiler_barrier 
and volatile.
It seems no any effect on rte_compiler_barrier().

>test1: without rte_compiler_barrier and volatile

#include 

int main()
{
int i,j;

*(int*)&i = 2;
*(int*)&j = 3;
printf("i=%d j=%d", i, j);
}
linux-LOubNs:/mnt/sdc/linhf/test # gcc -S test.c -I 
/usr/include/dpdk-1.7.0/x86_64-native-linuxapp-gcc/include/ -O3
linux-LOubNs:/mnt/sdc/linhf/test # cat tes

[dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instructions.

2015-05-12 Thread Linhaifeng

Hi, Ravi Kerur

On 2015/5/9 5:19, Ravi Kerur wrote:
> Preliminary results on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu
> 14.04 x86_64 shows comparisons using AVX/SSE instructions taking 1/3rd
> CPU ticks for 16, 32, 48 and 64 bytes comparison. In addition,

I had write a program to test rte_memcmp and I have a question about the result.
Why cost same CPU ticks for 128 256 512 1024 1500 bytes? Is there any problem in
my test?


[root at localhost test]# gcc avx_test.c -O3  -I 
/data/linhf/v2r2c00/open-source/dpdk/dpdk-2.0.0/x86_64-native-linuxapp-gcc/include/
 -mavx2 -DRTE_MACHINE_CPUFLAG_AVX2
[root at localhost test]# ./a.out 0
each test run 1 times
copy 16 bytes costs average 7(rte_memcmp) 10(memcmp) ticks
copy 32 bytes costs average 9(rte_memcmp) 11(memcmp) ticks
copy 64 bytes costs average 6(rte_memcmp) 13(memcmp) ticks
copy 128 bytes costs average 11(rte_memcmp) 14(memcmp) ticks
copy 256 bytes costs average 9(rte_memcmp) 14(memcmp) ticks
copy 512 bytes costs average 9(rte_memcmp) 14(memcmp) ticks
copy 1024 bytes costs average 9(rte_memcmp) 14(memcmp) ticks
copy 1500 bytes costs average 11(rte_memcmp) 14(memcmp) ticks
[root at localhost test]# ./a.out 1
each test run 1 times
copy 16 bytes costs average 2(rte_memcpy) 10(memcpy) ticks
copy 32 bytes costs average 2(rte_memcpy) 10(memcpy) ticks
copy 64 bytes costs average 3(rte_memcpy) 10(memcpy) ticks
copy 128 bytes costs average 7(rte_memcpy) 12(memcpy) ticks
copy 256 bytes costs average 9(rte_memcpy) 23(memcpy) ticks
copy 512 bytes costs average 14(rte_memcpy) 34(memcpy) ticks
copy 1024 bytes costs average 37(rte_memcpy) 61(memcpy) ticks
copy 1500 bytes costs average 62(rte_memcpy) 87(memcpy) ticks


Here is my program:

#include 
#include 
#include 
#include 
#include 

#define TIMES 1L

void test_memcpy(size_t n)
{
uint64_t start, end, i, start2, end2;
uint8_t *src, *dst;

src = (uint8_t*)malloc(n * sizeof(uint8_t));
dst = (uint8_t*)malloc(n * sizeof(uint8_t));

start = rte_rdtsc();
for (i = 0; i < TIMES; i++) {
rte_memcpy(dst, src, n);
}
end = rte_rdtsc();

start2 = rte_rdtsc();
for (i = 0; i < TIMES; i++) {
memcpy(dst, src, n);
}
end2 = rte_rdtsc();


free(src);
free(dst);

printf("copy %u bytes costs average %llu(rte_memcpy) %llu(memcpy) 
ticks\n", n, (end - start)/TIMES, (end2 - start2)/TIMES);
}

int test_memcmp(size_t n)
{
uint64_t start, end, i, start2, end2, j;
uint8_t *src, *dst;
int *ret;

src = (uint8_t*)malloc(n * sizeof(uint8_t));
dst = (uint8_t*)malloc(n * sizeof(uint8_t));
ret = (int*)malloc(TIMES * sizeof(int));

start = rte_rdtsc();
for (i = 0; i < TIMES; i++) {
ret[i] = rte_memcmp(dst, src, n);
}
end = rte_rdtsc();

start2 = rte_rdtsc();
for (i = 0; i < TIMES; i++) {
ret[i] = memcmp(dst, src, n);
}
end2 = rte_rdtsc();

// avoid gcc to optimize memcmp
for (i = 0; i < TIMES; i++) {
t += ret[i];
}

free(src);
free(dst);

printf("copy %u bytes costs average %llu(rte_memcmp) %llu(memcmp) 
ticks\n", n, (end - start)/TIMES, (end2 - start2)/TIMES);
return t;
}




int main(int narg, char** args)
{
printf("each test run %llu times\n", TIMES);

if (narg < 2) {
printf("usage:./avx_test 0/1 1:test memcpy 0:test memcmp\n");
return -1;
}

if (atoi(args[1])) {
test_memcpy(16);
test_memcpy(32);
test_memcpy(64);
test_memcpy(128);
test_memcpy(256);
test_memcpy(512);
test_memcpy(1024);
test_memcpy(1500);
} else {
test_memcmp(16);
test_memcmp(32);
test_memcmp(64);
test_memcmp(128);
test_memcmp(256);
test_memcmp(512);
test_memcmp(1024);
test_memcmp(1500);
}
}

[dpdk-dev] vhost-user technical isssues

2014-11-13 Thread Linhaifeng



On 2014/11/12 5:37, Xie, Huawei wrote:
> Hi Tetsuya:
> There are two major technical issues in my mind for vhost-user implementation.
> 
> 1) memory region map
> Vhost-user passes us file fd and offset for each memory region. Unfortunately 
> the mmap offset is "very" wrong. I discovered this issue long time ago, and 
> also found
> that I couldn't mmap the huge page file even with correct offset(need double 
> check).
> Just now I find that people reported this issue on Nov 3.
> [Qemu-devel] [PULL 27/29] vhost-user: fix mmap offset calculation
> Anyway, I turned to the same idea used in our DPDK vhost-cuse: only use the 
> fd for region(0) to map the  whole file.
> I think we should use this way temporarily to support qemu-2.1 as it has that 
> bug.
> 

this bug is not in dpdk's vhost-user just for qemu's vhost-user backend
> 2) what message is the indicator for vhost start/release?
> Previously  for vhost-cuse, it has SET_BACKEND message.
> What we should do for vhost-user?
> SET_VRING_KICK for start?
> What about for release?
> Unlike the kernel virtio, the DPDK virtio in guest could be restarted. 
> 
> Thoughts?
> 
> -huawei
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] vhost-user technical isssues

2014-11-13 Thread Linhaifeng



On 2014/11/12 5:37, Xie, Huawei wrote:
> Hi Tetsuya:
> There are two major technical issues in my mind for vhost-user implementation.
> 
> 1) memory region map
> Vhost-user passes us file fd and offset for each memory region. Unfortunately 
> the mmap offset is "very" wrong. I discovered this issue long time ago, and 
> also found
> that I couldn't mmap the huge page file even with correct offset(need double 
> check).
> Just now I find that people reported this issue on Nov 3.
> [Qemu-devel] [PULL 27/29] vhost-user: fix mmap offset calculation
> Anyway, I turned to the same idea used in our DPDK vhost-cuse: only use the 
> fd for region(0) to map the  whole file.
> I think we should use this way temporarily to support qemu-2.1 as it has that 
> bug.
> 

the size of region 0 is not same as the file size. may be you should mmap the 
other region.

region 0:
gpa = 0x0
size = 655360
ua = 0x2ac0
offset = 0

region 1:// use this region to mmap.BTW how to avoid mmap twice when there are 
two devices?
gpa = 0xC
size = 2146697216
ua = 0x2acc
offset = 786432



> 2) what message is the indicator for vhost start/release?
> Previously  for vhost-cuse, it has SET_BACKEND message.
> What we should do for vhost-user?
> SET_VRING_KICK for start?
> What about for release?
> Unlike the kernel virtio, the DPDK virtio in guest could be restarted. 
> 
> Thoughts?
> 
> -huawei
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] vhost-user technical isssues

2014-11-13 Thread Linhaifeng



On 2014/11/12 12:12, Tetsuya Mukawa wrote:
> Hi Xie,
> 
> (2014/11/12 6:37), Xie, Huawei wrote:
>> Hi Tetsuya:
>> There are two major technical issues in my mind for vhost-user 
>> implementation.
>>
>> 1) memory region map
>> Vhost-user passes us file fd and offset for each memory region. 
>> Unfortunately the mmap offset is "very" wrong. I discovered this issue long 
>> time ago, and also found
>> that I couldn't mmap the huge page file even with correct offset(need double 
>> check).
>> Just now I find that people reported this issue on Nov 3.
>> [Qemu-devel] [PULL 27/29] vhost-user: fix mmap offset calculation
>> Anyway, I turned to the same idea used in our DPDK vhost-cuse: only use the 
>> fd for region(0) to map the  whole file.
>> I think we should use this way temporarily to support qemu-2.1 as it has 
>> that bug.
> I agree with you.
> Also we may have an issue about un-mapping file on hugetlbfs of linux.
> When I check munmap(), it seems 'size' need to be aligned by hugepage size.
> (I guess it may be a kernel bug. Might be fixed already.)
> Please add return value checking code for munmap().
> Still munmap() might be failed.
> 
are you munmmap the region 0? region 0 is not need to mmap so not need to 
munmap too.

I can munmap success with the other regions.

>>
>> 2) what message is the indicator for vhost start/release?
>> Previously  for vhost-cuse, it has SET_BACKEND message.
>> What we should do for vhost-user?
>> SET_VRING_KICK for start?
> I think so.
> 
>> What about for release?
>> Unlike the kernel virtio, the DPDK virtio in guest could be restarted. 
>>
>> Thoughts?
> I guess we need to consider 2 types of restarting.
> One is virtio-net driver restarting, the other is vhost-user backend
> restarting.
> But, so far, it's nice to start to think about virtio-net driver
> restarting first.
> 
> Probably we need to implement a way to let vhost-user backend know
> virtio-net driver is restarted.
> I am not sure what is good way to let vhost-user backend know it.
> But how about followings RFC?
> 
> - When unix domain socket is closed, vhost-user backend should treat it
> as "release".
>  It is useful when QEMU itself is gone suddenly.
> 
> - Also, implementing new ioctl command like VHOST_RESET_BACKEND.
>  This command should be sent from virtio-net device of QEMU when
>  VIRTIO_CONFIG_STATUS_RESET register of virtio-net device is set by
> vrtio-net driver.
>  (Usually this register is set when virtio-net driver is initialized or
> stopped.)
>  It means we need to change QEMU. ;)
>  It seems virtio-net PMD already sets this register when PMD is
> initialized or stopped.
>  So this framework should work, and can let vhost-user backend know
> driver resetting.
>  (And I guess we can say same things for virtio-net kernel driver.)
>  It might be enough to close an unix domain socket, instead of
> implementing new command.
>  But in the case, we may need auto reconnection mechanism.
> 
> - We also need to consider DPDK application is gone suddenly without
> setting reset register.
>  In the case, vhost-user backend cannot know it. Only user (or some kind
> of watchdog
>  applications on guest) knows it.
>  Because of this, user(or app.) should have responsibility to solve this
> situation.
>  To be more precise, user should let vhost-user backend know device
> releasing.
>  If user starts an other DPDK application without solving the issue, the
> new DPDK application may
>  access memory that vhost-user backend is also accessing.
>  I guess user can solve the issue using "dpdk_nic_bind.py".
>  The script can move virtio-net device to kernel virtio-net driver, and
> return it to igb_uio.
>  While those steps, virtio-net device is initialized by virtio-net
> kernel driver.
>  So vhost-user backend can know device releasing.
> 
> Tetsuya
> 
>>
>> -huawei
> 
> 
> 
> 

-- 
Regards,
Haifeng

[dpdk-dev] vhost-user technical isssues

2014-11-14 Thread Linhaifeng



On 2014/11/14 9:28, Xie, Huawei wrote:
> 
> 
>> -Original Message-----
>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>> Sent: Wednesday, November 12, 2014 11:28 PM
>> To: Xie, Huawei; 'Tetsuya Mukawa'; dev at dpdk.org
>> Subject: Re: [dpdk-dev] vhost-user technical isssues
>>
>>
>>
>> On 2014/11/12 5:37, Xie, Huawei wrote:
>>> Hi Tetsuya:
>>> There are two major technical issues in my mind for vhost-user
>> implementation.
>>>
>>> 1) memory region map
>>> Vhost-user passes us file fd and offset for each memory region. 
>>> Unfortunately
>> the mmap offset is "very" wrong. I discovered this issue long time ago, and 
>> also
>> found
>>> that I couldn't mmap the huge page file even with correct offset(need double
>> check).
>>> Just now I find that people reported this issue on Nov 3.
>>> [Qemu-devel] [PULL 27/29] vhost-user: fix mmap offset calculation
>>> Anyway, I turned to the same idea used in our DPDK vhost-cuse: only use the 
>>> fd
>> for region(0) to map the  whole file.
>>> I think we should use this way temporarily to support qemu-2.1 as it has 
>>> that
>> bug.
>>>
>>
>> the size of region 0 is not same as the file size. may be you should mmap the
>> other region.
> Haifeng:
> 
> Will calculate the maximum memory size, and use any file fd to mmap it.
> Here we assume the fds for different regions actually point to the same file.

actually there may be two hugepage files created by qemu.
one day i create a 4G VM found qemu create 2 hugepage file and send them to 
vhost-user.
you can try to test it.

> 
> In theory we should use the fd for each region to map each memory region.
> In fact we could map once. This will also save address space for 1GB huge page
> due to mmap alignment requirement.
>>
>> region 0:
>> gpa = 0x0
>> size = 655360
>> ua = 0x2ac0
>> offset = 0
>>
>> region 1:// use this region to mmap.BTW how to avoid mmap twice when there
>> are two devices?
>> gpa = 0xC
>> size = 2146697216
>> ua = 0x2acc
>> offset = 786432
> 
> What do you mean by two devices?
>>
>>

e.g there are two vhost-user backends in a VM, we will receive two 
SET_MEM_TABLE messages, actually we only need mmap once in one message.

I think qemu should add a new message to send all hugepage fd and size once.
as this we not need to mmap and calculate memory in set_mem_table message.

>>
>>> 2) what message is the indicator for vhost start/release?
>>> Previously  for vhost-cuse, it has SET_BACKEND message.
>>> What we should do for vhost-user?
>>> SET_VRING_KICK for start?
>>> What about for release?
>>> Unlike the kernel virtio, the DPDK virtio in guest could be restarted.
>>>
>>> Thoughts?
>>>
>>> -huawei
>>>
>>>
>>
>> --
>> Regards,
>> Haifeng
> 
> 
> .
> 

-- 
Regards,
Haifeng

[dpdk-dev] vhost-user technical isssues

2014-11-14 Thread Linhaifeng



On 2014/11/14 10:30, Tetsuya Mukawa wrote:
> Hi Lin,
> 
> (2014/11/13 15:30), Linhaifeng wrote:
>> On 2014/11/12 12:12, Tetsuya Mukawa wrote:
>>> Hi Xie,
>>>
>>> (2014/11/12 6:37), Xie, Huawei wrote:
>>>> Hi Tetsuya:
>>>> There are two major technical issues in my mind for vhost-user 
>>>> implementation.
>>>>
>>>> 1) memory region map
>>>> Vhost-user passes us file fd and offset for each memory region. 
>>>> Unfortunately the mmap offset is "very" wrong. I discovered this issue 
>>>> long time ago, and also found
>>>> that I couldn't mmap the huge page file even with correct offset(need 
>>>> double check).
>>>> Just now I find that people reported this issue on Nov 3.
>>>> [Qemu-devel] [PULL 27/29] vhost-user: fix mmap offset calculation
>>>> Anyway, I turned to the same idea used in our DPDK vhost-cuse: only use 
>>>> the fd for region(0) to map the  whole file.
>>>> I think we should use this way temporarily to support qemu-2.1 as it has 
>>>> that bug.
>>> I agree with you.
>>> Also we may have an issue about un-mapping file on hugetlbfs of linux.
>>> When I check munmap(), it seems 'size' need to be aligned by hugepage size.
>>> (I guess it may be a kernel bug. Might be fixed already.)
>>> Please add return value checking code for munmap().
>>> Still munmap() might be failed.
>>>
>> are you munmmap the region 0? region 0 is not need to mmap so not need to 
>> munmap too.
>>
>> I can munmap success with the other regions.
> Could you please let me know how many size do you specify when you
> munmap region1?
> 

2G (region->memory_size + region.memory->offset)

> I still fail to munmap region1.
> Here is a patch to vhost-user test of QEMU. Could you please check it?
> 
> --
> diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
> index 75fedf0..4e17910 100644
> --- a/tests/vhost-user-test.c
> +++ b/tests/vhost-user-test.c
> @@ -37,7 +37,7 @@
> #endif
> 
> #define QEMU_CMD_ACCEL " -machine accel=tcg"
> -#define QEMU_CMD_MEM " -m 512 -object
> memory-backend-file,id=mem,size=512M,"\
> +#define QEMU_CMD_MEM " -m 6000 -object
> memory-backend-file,id=mem,size=6000M,"\
> "mem-path=%s,share=on -numa node,memdev=mem"
> #define QEMU_CMD_CHR " -chardev socket,id=chr0,path=%s"
> #define QEMU_CMD_NETDEV " -netdev
> vhost-user,id=net0,chardev=chr0,vhostforce"
> @@ -221,14 +221,16 @@ static void read_guest_mem(void)
> 
> /* check for sanity */
> g_assert_cmpint(fds_num, >, 0);
> - g_assert_cmpint(fds_num, ==, memory.nregions);
> + //g_assert_cmpint(fds_num, ==, memory.nregions);
> 
> + fprintf(stderr, "%s(%d)\n", __func__, __LINE__);
> /* iterate all regions */
> for (i = 0; i < fds_num; i++) {
> + int ret = 0;
> 
> /* We'll check only the region statring at 0x0*/
> if (memory.regions[i].guest_phys_addr != 0x0) {
> - continue;
> + //continue;
> }

if (memory.regions[i].guest_phys_addr == 0x0) {
close(fd);
continue;
}

> 
> g_assert_cmpint(memory.regions[i].memory_size, >, 1024);
> @@ -237,6 +239,13 @@ static void read_guest_mem(void)
> 
> guest_mem = mmap(0, size, PROT_READ | PROT_WRITE,
> MAP_SHARED, fds[i], 0);
> + fprintf(stderr, "guest_phys_addr=%lu, memory_size=%lu, "
> + "userspace_addr=%lu, mmap_offset=%lu\n",
> + memory.regions[i].guest_phys_addr,
> + memory.regions[i].memory_size,
> + memory.regions[i].userspace_addr,
> + memory.regions[i].mmap_offset);
> + fprintf(stderr, "mmap=%p, size=%lu\n", guest_mem, size);
> 
> g_assert(guest_mem != MAP_FAILED);
> guest_mem += (memory.regions[i].mmap_offset / sizeof(*guest_mem));
> @@ -248,7 +257,20 @@ static void read_guest_mem(void)
> g_assert_cmpint(a, ==, b);
> }
> 
> - munmap(guest_mem, memory.regions[i].memory_size);
> + ret = munmap(guest_mem, memory.regions[i].memory_size);
> + fprintf(stderr, "munmap=%p, size=%lu, ret=%d\n",
> + guest_mem, memory.regions[i].memory_size, ret);
> + {
> + size_t hugepagesize;
> +
> + size = memory.regions[i].memory_size;
> + /* assume hugepage size is 1GB, try again */
> + hugepagesize = 1024 * 1024 * 1024;
> + size = (size + hugepagesize - 1) / hugepagesize * hugepagesize;
> + }
size should be same as mmap and
guest_mem -= (memory.regions[i].mmap_offset / sizeof(*guest_mem));
> + ret = munmap(guest_mem, size);
> + fprintf(stderr, "munmap=%p, size=%lu, ret=%d\n",
> + guest_mem, size, ret);
> }
> 
> g_assert_cmpint(1, ==, 1);
> --
> $ sudo QTEST_HUGETLBFS_PATH=/mnt/huge make check
> region=0, mmap=0x2aaac000, size=6291456000
> region=0, munmap=0x2aab8000, size=3070230528, ret=-1 << failed
> region=0, munmap=0x2aab8000, size=3221225472, ret=0
> region=1, mmap=0x2aab8000, size=655360
> region=1, munmap=0x2aab8000, size=655360, ret=-1 << failed
> region=1, munmap=0x2aab8000, size=1073741824, ret=0
> 
> 
> Thanks,
> Tetsuya
> 
> .
> 

-- 
Regards,
Haifeng

[dpdk-dev] vhost-user technical isssues

2014-11-14 Thread Linhaifeng



On 2014/11/14 11:40, Tetsuya Mukawa wrote:
> Hi Lin,
> 
> (2014/11/14 12:13), Linhaifeng wrote:
>>
>> size should be same as mmap and
>> guest_mem -= (memory.regions[i].mmap_offset / sizeof(*guest_mem));
>>
> 
> Thanks. It should be.
> How about following patch?
> 
> ---
> diff --git a/tests/vhost-user-test.c b/tests/vhost-user-test.c
> index 75fedf0..be4b171 100644
> --- a/tests/vhost-user-test.c
> +++ b/tests/vhost-user-test.c
> @@ -37,7 +37,7 @@
> #endif
> 
> #define QEMU_CMD_ACCEL " -machine accel=tcg"
> -#define QEMU_CMD_MEM " -m 512 -object
> memory-backend-file,id=mem,size=512M,"\
> +#define QEMU_CMD_MEM " -m 6000 -object
> memory-backend-file,id=mem,size=6000M,"\
> "mem-path=%s,share=on -numa node,memdev=mem"
> #define QEMU_CMD_CHR " -chardev socket,id=chr0,path=%s"
> #define QEMU_CMD_NETDEV " -netdev
> vhost-user,id=net0,chardev=chr0,vhostforce"
> @@ -221,13 +221,16 @@ static void read_guest_mem(void)
> 
> /* check for sanity */
> g_assert_cmpint(fds_num, >, 0);
> - g_assert_cmpint(fds_num, ==, memory.nregions);
> + //g_assert_cmpint(fds_num, ==, memory.nregions);
> 
> + fprintf(stderr, "%s(%d)\n", __func__, __LINE__);
> /* iterate all regions */
> for (i = 0; i < fds_num; i++) {
> + int ret = 0;
> 
> /* We'll check only the region statring at 0x0*/
> - if (memory.regions[i].guest_phys_addr != 0x0) {
> + if (memory.regions[i].guest_phys_addr == 0x0) {
> + close(fds[i]);
> continue;
> }
> 
> @@ -237,6 +240,7 @@ static void read_guest_mem(void)
> 
> guest_mem = mmap(0, size, PROT_READ | PROT_WRITE,


How many is size? mmap_size + mmap_offset ?


> MAP_SHARED, fds[i], 0);
> + fprintf(stderr, "region=%d, mmap=%p, size=%lu\n", i, guest_mem, size);
> 
> g_assert(guest_mem != MAP_FAILED);
> guest_mem += (memory.regions[i].mmap_offset / sizeof(*guest_mem));
> @@ -247,8 +251,10 @@ static void read_guest_mem(void)
> 
> g_assert_cmpint(a, ==, b);
> }
> -
> - munmap(guest_mem, memory.regions[i].memory_size);
> + guest_mem -= (memory.regions[i].mmap_offset / sizeof(*guest_mem));
> + ret = munmap(guest_mem, memory.regions[i].memory_size);

memory.regions[i].memory_size --> memory.regions[i].memory_size + 
memory.regions[i].memory_offset

check you have apply qemu's patch: [PATCH] vhost-user: fix mmap offset 
calculation

> + fprintf(stderr, "region=%d, munmap=%p, size=%lu, ret=%d\n",
> + i, guest_mem, size, ret);
> }
> 
> g_assert_cmpint(1, ==, 1);
> ---
> I am using 1GB hugepage size.
> 
> $ sudo QTEST_HUGETLBFS_PATH=/mnt/huge make check
> region=0, mmap=0x2aaac000, size=6291456000
> region=0, munmap=0x2aaac000, size=6291456000, ret=-1 << failed
> 
> 6291456000 is not aligned by 1GB.
> When I specify 4096MB as guest memory size, munmap() doesn't return
> error like following.
> 
> $ sudo QTEST_HUGETLBFS_PATH=/mnt/huge make check
> region=0, mmap=0x2aaac000, size=4294967296
> region=0, munmap=0x2aaac000, size=4294967296, ret=0
> 
> Thanks,
> Tetsuya
> 
> .
> 

-- 
Regards,
Haifeng

[dpdk-dev] vhost-user technical isssues

2014-11-14 Thread Linhaifeng



On 2014/11/14 13:12, Tetsuya Mukawa wrote:
> ease try another value like 6000MB

i have try this value 6000MB.I can munmap success.

you mmap with size "memory_size + memory_offset" should also munmap with this 
size.

-- 
Regards,
Haifeng

[dpdk-dev] how to avoid "non-matching type" in my program with dpdk

2014-11-27 Thread Linhaifeng

Hi,all

when i compile my program with dpdk there is a warning found by gcc.The message 
is like follow.I don't know how to avoid it.Help me.

/usr/include/dpdk-1.7.0/x86_64-native-linuxapp-gcc//include/rte_common.h:176: 
warning: cast from function call of type ?uintptr_t? to non-matching type ?void 
*?

# gcc --version
gcc-4.3 (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973]
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

-- 
Regards,
Haifeng

[dpdk-dev] release hugepages after application exit?

2014-10-29 Thread Linhaifeng



On 2014/10/29 9:26, Choonho Son wrote:
> Hi,
> 
> After terminating DPDK application, it does not release hugepages.
> Is there any reason for it or to-do item?
> 
> Thanks,
> Choonho Son
> 
> 

I have wrote a patch to release hugepages but haven't send it.
I will send this path later.
-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] add free hugepage function

2014-10-29 Thread linhaifeng

maybe somebody want to free hugepages when application exit.
so add this function for application to release hugepages when exit.

Signed-off-by: linhaifeng 
---
 .../lib/librte_eal/common/include/rte_memory.h | 11 +
 .../lib/librte_eal/linuxapp/eal/eal_memory.c   | 27 ++
 2 files changed, 38 insertions(+)

diff --git a/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h 
b/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h
index 4cf8ea9..7251b6b 100644
--- a/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h
+++ b/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h
@@ -172,6 +172,17 @@ unsigned rte_memory_get_nchannel(void);
  */
 unsigned rte_memory_get_nrank(void);

+/**
+ * Free all the hugepages.For the application to call when exit.
+ *
+ * @param void
+ *
+ * @return
+ *   0: successfully
+ *   negative: error
+ */
+int rte_eal_hugepage_free(void);
+
 #ifdef RTE_LIBRTE_XEN_DOM0
 /**
  * Return the physical address of elt, which is an element of the pool mp.
diff --git a/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c
index f2454f4..1ae0e79 100644
--- a/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -98,6 +98,13 @@
 #include "eal_filesystem.h"
 #include "eal_hugepages.h"

+struct hugepage_table {
+   struct hugepage_file *hugepg_tbl;
+   unsigned nr_hugefiles;
+};
+
+static struct hugepage_table g_hugepage_table;
+
 /**
  * @file
  * Huge page mapping under linux
@@ -1202,6 +1209,7 @@ rte_eal_hugepage_init(void)
(unsigned)
(used_hp[i].hugepage_sz 
/ 0x10),
j);
+   g_hugepage_table.nr_hugefiles += 
used_hp[i].num_pages[j];
}
}
}
@@ -1237,6 +1245,8 @@ rte_eal_hugepage_init(void)
goto fail;
}

+   g_hugepage_table.hugepg_tbl = hugepage;
+
/* free the temporary hugepage table */
free(tmp_hp);
tmp_hp = NULL;
@@ -1487,6 +1497,23 @@ error:
return -1;
 }

+int
+rte_eal_hugepage_free(void)
+{
+   struct hugepage_file *hugepg_tbl = g_hugepage_table.hugepg_tbl;
+   unsigned i;
+   unsigned nr_hugefiles = g_hugepage_table.nr_hugefiles;
+
+   RTE_LOG(INFO, EAL, "unlink %u hugepage files\n", nr_hugefiles);
+
+   for (i = 0; i < nr_hugefiles; i++) {
+   unlink(hugepg_tbl[i].filepath);
+   hugepg_tbl[i].orig_va = NULL;
+   }
+
+   return 0;
+}
+
 static int
 rte_eal_memdevice_init(void)
 {
-- 
1.8.3.1

[dpdk-dev] [PATCH] add free hugepage function

2014-10-29 Thread Linhaifeng



On 2014/10/29 11:44, Matthew Hall wrote:
> On Wed, Oct 29, 2014 at 03:27:58AM +, Qiu, Michael wrote:
>> I just saw one return path with value '0', and no any other place 
>> return a negative value,  so it is better to  be designed as one
>> non-return function,
>>
>> +void
>> +rte_eal_hugepage_free(void)
>> +{
>> +struct hugepage_file *hugepg_tbl = g_hugepage_table.hugepg_tbl;
>> +unsigned i;
>> +unsigned nr_hugefiles = g_hugepage_table.nr_hugefiles;
>> +
>> +RTE_LOG(INFO, EAL, "unlink %u hugepage files\n", nr_hugefiles);
>> +
>> +for (i = 0; i < nr_hugefiles; i++) {
>> +unlink(hugepg_tbl[i].filepath);
>> +hugepg_tbl[i].orig_va = NULL;
>> +}
>> +}
>> +
>>
>> Thanks,
>> Michael
> 
> Actually, I don't think that's quite right.
> 
> http://linux.die.net/man/2/unlink
> 
> "On success, zero is returned. On error, -1 is returned, and errno is set 
> appropriately." So it should be returning an error, and logging a message for 
> a file it cannot unlink or people will be surprised with weird failures.
> 
> It also had some minor typos / English in the comments but we can fix that 
> too.
> 
> Matthew.
> 
> 

Thank you Michael & Matthew

I will fix it.
:)

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v2] support free hugepages

2014-10-29 Thread linhaifeng

rte_eal_hugepage_free() is used for unlink all hugepages.If you want to
free all hugepages you must make sure that you have stop to use it,and you
must call this function before exit process.

Signed-off-by: linhaifeng 
---
 .../lib/librte_eal/common/include/rte_memory.h | 11 
 .../lib/librte_eal/linuxapp/eal/eal_memory.c   | 31 ++
 2 files changed, 42 insertions(+)

diff --git a/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h 
b/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h
index 4cf8ea9..f6ad95f 100644
--- a/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h
+++ b/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h
@@ -172,6 +172,17 @@ unsigned rte_memory_get_nchannel(void);
  */
 unsigned rte_memory_get_nrank(void);

+/**
+ * Unlink all hugepages which created by dpdk.
+ *
+ * @param void
+ *
+ * @return
+ *   0: successfully
+ *   negative: error
+ */
+int rte_eal_hugepage_free(void);
+
 #ifdef RTE_LIBRTE_XEN_DOM0
 /**
  * Return the physical address of elt, which is an element of the pool mp.
diff --git a/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c 
b/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c
index f2454f4..109207c 100644
--- a/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -98,6 +98,13 @@
 #include "eal_filesystem.h"
 #include "eal_hugepages.h"

+struct hugepage_table {
+   struct hugepage_file *hugepg_tbl;
+   unsigned nr_hugefiles;
+};
+
+static struct hugepage_table g_hugepage_table;
+
 /**
  * @file
  * Huge page mapping under linux
@@ -1202,6 +1209,7 @@ rte_eal_hugepage_init(void)
(unsigned)
(used_hp[i].hugepage_sz 
/ 0x10),
j);
+   g_hugepage_table.nr_hugefiles += 
used_hp[i].num_pages[j];
}
}
}
@@ -1237,6 +1245,8 @@ rte_eal_hugepage_init(void)
goto fail;
}

+   g_hugepage_table.hugepg_tbl = hugepage;
+
/* free the temporary hugepage table */
free(tmp_hp);
tmp_hp = NULL;
@@ -1487,6 +1497,27 @@ error:
return -1;
 }

+int
+rte_eal_hugepage_free(void)
+{
+   struct hugepage_file *hugepg_tbl = g_hugepage_table.hugepg_tbl;
+   unsigned i;
+   unsigned nr_hugefiles = g_hugepage_table.nr_hugefiles;
+   int ret = 0;
+
+   for (i = 0; i < nr_hugefiles; i++) {
+   ret = unlink(hugepg_tbl[i].filepath);
+   if (ret != 0) {
+   RTE_LOG(ERR, EAL, "Failed to unlink %s", 
hugepg_tbl[i].filepath);
+   return ret;
+   }
+   hugepg_tbl[i].orig_va = NULL;
+   }
+
+   RTE_LOG(INFO, EAL, "unlink %u hugepage files\n", nr_hugefiles);
+   return ret;
+}
+
 static int
 rte_eal_memdevice_init(void)
 {
-- 
1.8.3.1

[dpdk-dev] [PATCH v2] support free hugepages

2014-10-29 Thread Linhaifeng



On 2014/10/29 14:14, Qiu, Michael wrote:
> ? 10/29/2014 1:49 PM, linhaifeng ??:
>> rte_eal_hugepage_free() is used for unlink all hugepages.If you want to
>> free all hugepages you must make sure that you have stop to use it,and you
>> must call this function before exit process.
>>
>> Signed-off-by: linhaifeng 
>> ---
>>  .../lib/librte_eal/common/include/rte_memory.h | 11 
>>  .../lib/librte_eal/linuxapp/eal/eal_memory.c   | 31 
>> ++
>>  2 files changed, 42 insertions(+)
>>
>> diff --git a/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h 
>> b/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h
>> index 4cf8ea9..f6ad95f 100644
>> --- a/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h
>> +++ b/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h
>> @@ -172,6 +172,17 @@ unsigned rte_memory_get_nchannel(void);
>>   */
>>  unsigned rte_memory_get_nrank(void);
>>  
>> +/**
>> + * Unlink all hugepages which created by dpdk.
>> + *
>> + * @param void
>> + *
>> + * @return
>> + *   0: successfully
>> + *   negative: error
>> + */
>> +int rte_eal_hugepage_free(void);
>> +
>>  #ifdef RTE_LIBRTE_XEN_DOM0
>>  /**
>>   * Return the physical address of elt, which is an element of the pool mp.
>> diff --git a/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c 
>> b/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c
>> index f2454f4..109207c 100644
>> --- a/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c
>> +++ b/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c
>> @@ -98,6 +98,13 @@
>>  #include "eal_filesystem.h"
>>  #include "eal_hugepages.h"
>>  
>> +struct hugepage_table {
>> +struct hugepage_file *hugepg_tbl;
>> +unsigned nr_hugefiles;
>> +};
>> +
>> +static struct hugepage_table g_hugepage_table;
>> +
>>  /**
>>   * @file
>>   * Huge page mapping under linux
>> @@ -1202,6 +1209,7 @@ rte_eal_hugepage_init(void)
>>  (unsigned)
>>  (used_hp[i].hugepage_sz 
>> / 0x10),
>>  j);
>> +g_hugepage_table.nr_hugefiles += 
>> used_hp[i].num_pages[j];
>>  }
>>  }
>>  }
>> @@ -1237,6 +1245,8 @@ rte_eal_hugepage_init(void)
>>  goto fail;
>>  }
>>  
>> +g_hugepage_table.hugepg_tbl = hugepage;
>> +
>>  /* free the temporary hugepage table */
>>  free(tmp_hp);
>>  tmp_hp = NULL;
>> @@ -1487,6 +1497,27 @@ error:
>>  return -1;
>>  }
>>  
>> +int
>> +rte_eal_hugepage_free(void)
>> +{
>> +struct hugepage_file *hugepg_tbl = g_hugepage_table.hugepg_tbl;
>> +unsigned i;
>> +unsigned nr_hugefiles = g_hugepage_table.nr_hugefiles;
>> +int ret = 0;
>> +
>> +for (i = 0; i < nr_hugefiles; i++) {
>> +ret = unlink(hugepg_tbl[i].filepath);
>> +if (ret != 0) {
>> +RTE_LOG(ERR, EAL, "Failed to unlink %s", 
>> hugepg_tbl[i].filepath);
>> +return ret;
>> +}
>> +hugepg_tbl[i].orig_va = NULL;
> 
> BTW, is it better to first set hugepg_tbl[i].orig_vato NULL, then unlink
> filepath?
> It may be not a good idea to first remove then set to NULL.
> 
> Thanks,
> Michael
> 

If first set hugepg_tbl[i].orig_va to NULL,then failed to unlink you have to 
restore hugepg_tbl[i].orig_va.
So I first to unlink for less codes.

>> +}
>> +
>> +RTE_LOG(INFO, EAL, "unlink %u hugepage files\n", nr_hugefiles);
>> +return ret;
>> +}
>> +
>>  static int
>>  rte_eal_memdevice_init(void)
>>  {
> 
> 
> .
> 

-- 
Regards,
Haifeng

[dpdk-dev] [PATCH] add free hugepage function

2014-10-29 Thread Linhaifeng



On 2014/10/29 13:26, Qiu, Michael wrote:
> ? 10/29/2014 11:46 AM, Matthew Hall ??:
>> On Wed, Oct 29, 2014 at 03:27:58AM +, Qiu, Michael wrote:
>>> I just saw one return path with value '0', and no any other place 
>>> return a negative value,  so it is better to  be designed as one
>>> non-return function,
>>>
>>> +void
>>> +rte_eal_hugepage_free(void)
>>> +{
>>> +   struct hugepage_file *hugepg_tbl = g_hugepage_table.hugepg_tbl;
>>> +   unsigned i;
>>> +   unsigned nr_hugefiles = g_hugepage_table.nr_hugefiles;
>>> +
>>> +   RTE_LOG(INFO, EAL, "unlink %u hugepage files\n", nr_hugefiles);
>>> +
>>> +   for (i = 0; i < nr_hugefiles; i++) {
>>> +   unlink(hugepg_tbl[i].filepath);
>>> +   hugepg_tbl[i].orig_va = NULL;
>>> +   }
>>> +}
>>> +
>>>
>>> Thanks,
>>> Michael
>> Actually, I don't think that's quite right.
>>
>> http://linux.die.net/man/2/unlink
>>
>> "On success, zero is returned. On error, -1 is returned, and errno is set 
>> appropriately." So it should be returning an error, and logging a message 
>> for 
>> a file it cannot unlink or people will be surprised with weird failures.
> 
> Really need one message for unlink failed, but I'm afraid that if it
> make sense for return an error code when application exit.
> 
> Thanks
> Michael
>> It also had some minor typos / English in the comments but we can fix that 
>> too.
>>
>> Matthew.
>>
> 
> 
> 
Agree.May be it is not need to return error?
-- 
Regards,
Haifeng

[dpdk-dev] [PATCH v2] support free hugepages

2014-10-29 Thread Linhaifeng



On 2014/10/29 16:04, Qiu, Michael wrote:
> 10/29/2014 2:41 PM, Linhaifeng :
>>
>> On 2014/10/29 14:14, Qiu, Michael wrote:
>>> ? 10/29/2014 1:49 PM, linhaifeng ??:
>>>> rte_eal_hugepage_free() is used for unlink all hugepages.If you want to
>>>> free all hugepages you must make sure that you have stop to use it,and you
>>>> must call this function before exit process.
>>>>
>>>> Signed-off-by: linhaifeng 
>>>> ---
>>>>  .../lib/librte_eal/common/include/rte_memory.h | 11 
>>>>  .../lib/librte_eal/linuxapp/eal/eal_memory.c   | 31 
>>>> ++
>>>>  2 files changed, 42 insertions(+)
>>>>
>>>> diff --git a/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h 
>>>> b/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h
>>>> index 4cf8ea9..f6ad95f 100644
>>>> --- a/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h
>>>> +++ b/dpdk/dpdk-1.7.0/lib/librte_eal/common/include/rte_memory.h
>>>> @@ -172,6 +172,17 @@ unsigned rte_memory_get_nchannel(void);
>>>>   */
>>>>  unsigned rte_memory_get_nrank(void);
>>>>  
>>>> +/**
>>>> + * Unlink all hugepages which created by dpdk.
>>>> + *
>>>> + * @param void
>>>> + *
>>>> + * @return
>>>> + *   0: successfully
>>>> + *   negative: error
>>>> + */
>>>> +int rte_eal_hugepage_free(void);
>>>> +
>>>>  #ifdef RTE_LIBRTE_XEN_DOM0
>>>>  /**
>>>>   * Return the physical address of elt, which is an element of the pool mp.
>>>> diff --git a/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c 
>>>> b/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c
>>>> index f2454f4..109207c 100644
>>>> --- a/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c
>>>> +++ b/dpdk/dpdk-1.7.0/lib/librte_eal/linuxapp/eal/eal_memory.c
>>>> @@ -98,6 +98,13 @@
>>>>  #include "eal_filesystem.h"
>>>>  #include "eal_hugepages.h"
>>>>  
>>>> +struct hugepage_table {
>>>> +  struct hugepage_file *hugepg_tbl;
>>>> +  unsigned nr_hugefiles;
>>>> +};
>>>> +
>>>> +static struct hugepage_table g_hugepage_table;
>>>> +
>>>>  /**
>>>>   * @file
>>>>   * Huge page mapping under linux
>>>> @@ -1202,6 +1209,7 @@ rte_eal_hugepage_init(void)
>>>>(unsigned)
>>>>(used_hp[i].hugepage_sz 
>>>> / 0x10),
>>>>j);
>>>> +  g_hugepage_table.nr_hugefiles += 
>>>> used_hp[i].num_pages[j];
>>>>}
>>>>}
>>>>}
>>>> @@ -1237,6 +1245,8 @@ rte_eal_hugepage_init(void)
>>>>goto fail;
>>>>}
>>>>  
>>>> +  g_hugepage_table.hugepg_tbl = hugepage;
>>>> +
>>>>/* free the temporary hugepage table */
>>>>free(tmp_hp);
>>>>tmp_hp = NULL;
>>>> @@ -1487,6 +1497,27 @@ error:
>>>>return -1;
>>>>  }
>>>>  
>>>> +int
>>>> +rte_eal_hugepage_free(void)
>>>> +{
>>>> +  struct hugepage_file *hugepg_tbl = g_hugepage_table.hugepg_tbl;
>>>> +  unsigned i;
>>>> +  unsigned nr_hugefiles = g_hugepage_table.nr_hugefiles;
>>>> +  int ret = 0;
>>>> +
>>>> +  for (i = 0; i < nr_hugefiles; i++) {
>>>> +  ret = unlink(hugepg_tbl[i].filepath);
>>>> +  if (ret != 0) {
>>>> +  RTE_LOG(ERR, EAL, "Failed to unlink %s", 
>>>> hugepg_tbl[i].filepath);
>>>> +  return ret;
>>>> +  }
>>>> +  hugepg_tbl[i].orig_va = NULL;
>>> BTW, is it better to first set hugepg_tbl[i].orig_vato NULL, then unlink
>>> filepath?
>>> It may be not a good idea to first remove then set to NULL.
>>>
>>> Thanks,
>>> Michael
>>>
>> If first set hugepg_tbl[i].orig_va to NULL,then failed to unlink you have to 
>> restore hugepg_tbl[i].orig_va.
>> So I first to unlink for less codes.
> 
> But it may be a big issue(not safe) when you unlink the path before you
> set the pointer to NULL,  another thing is , why you need to restore the
> orig_va since the app will exit? Even you need to restore it just one or
> two lines code, I think it is not a big deal.
> 
> Thanks,
> Michael

Thank you,Michael

Yes,it's not safe to unlink the path before set the pointer to NULL.I will fix 
it.

BTW.is need to return error when failed to unlink?May be not need?
>>
>>>> +  }
>>>> +
>>>> +  RTE_LOG(INFO, EAL, "unlink %u hugepage files\n", nr_hugefiles);
>>>> +  return ret;
>>>> +}
>>>> +
>>>>  static int
>>>>  rte_eal_memdevice_init(void)
>>>>  {
>>>
>>> .
>>>
> 
> 
> .
> 

-- 
Regards,
Haifeng

[dpdk-dev] dpdk-pktgen how to show more than 4 ports in one page?

2016-07-30 Thread linhaifeng

hi

I use 6 ports to send pkts in VM, but can only 4 ports work, how to enable more 
ports to work?

[dpdk-dev] [PATCH RFC] Memcpy optimization

2015-05-26 Thread Linhaifeng



On 2014/11/14 17:08, Wang, Zhihong wrote:
> Hi all,
> 
> I'd like to propose an update on DPDK memcpy optimization.
> Please see RFC below for details.
> 
> 
> Thanks
> John
> 
> ---
> 
> DPDK Memcpy Optimization
> 
> 1. Introduction
> 2. Terminology
> 3. Mechanism
> 3.1 Architectural Insights
> 3.2 DPDK memcpy optimization
> 3.3 Code change
> 4. Glibc memcpy analysis
> Acknowledgements
> Author's Address
> 
> 
> 1. Introduction
> 
> This document describes DPDK memcpy optimization, for both SSE and AVX 
> platforms.
> 
> Glibc memcpy is for general uses, it's not so efficient for DPDK where copies 
> are small and from cache to cache mainly.
> Also, glibc is changing over versions, some tradeoffs it made have negative 
> impact on DPDK performance. This in the meantime makes DPDK memcpy 
> performance glibc version dependent.
> For this cause, it's necessary to maintain a standalone memcpy implementation 
> to take full advantage of hardware features, and make special optimization 
> aiming at DPDK scenarios.
> 
> Current DPDK memcpy has the following improvement areas:
> * No support for 256-bit load/store
> * Poor performance for unaligned cases
> * Performance drops at certain odd copy sizes
> * Make slow glibc call for constant copies
> 
> It can be improved significantly by utilizing 256-bit AVX instructions and 
> applying more optimization techniques.
> 
> 2. Terminology
> 
> Aligned copy: Same offset for source & destination starting addresses
> Unaligned copy: Different offsets for source & destination starting addresses
> Constant payload size: Copy length can be decided at compile time
> Variable payload size: Copy length can't be decided at compile time
> 
> 3. Mechanism
> 
> 3.1 Architectural Insights
> 
> New architectures are likely to have better cache performance and stronger 
> ISA implementation.
> Memcpy needs to make full utilization of cache bandwidth, and implement 
> different mechanisms according to hardware features.
> Below is the architecture analysis for memory performance in Haswell and 
> Sandy Bridge.
> 
> Haswell has significant improvements in memory hierarchy over Sandy Bridge:
> * 2x cache bandwidth: From 48 B/cycle to 96 B/cycle
> * Sandy Bridge suffers from L1D bank conflicts, Haswell doesn't
> * Sandy Bridge has 2 split line buffers, Haswell has 4
> * Forwarding latency is 2 cycles for 256-bit AVX loads in Sandy Bridge, 1 in 
> Haswell
> 
> 3.2 DPDK memcpy optimization
> 
> DPDK memcpy calls are mainly cache to cache cases with payload no larger than 
> 8KB, they can be categorized into 4 scenarios:
> * Aligned copy, with constant payload size
> * Aligned copy, with variable payload size
> * Unaligned copy, with constant payload size
> * Unaligned copy, with variable payload size
> 
> Each scenario should be optimized according to its characteristics:
> * For aligned cases, no special optimization techniques are required
> * For unaligned cases:
> * Make store address aligned is a basic technique to improve performance
> * Load address alignment is a tradeoff between bit shifting overhead and 
> unaligned memory access penalty, which should be assessed by test
> * Load/store address should be made available as early as possible to 
> fully utilize the pipeline
> * For constant cases, inlining can bring significant benefits by means of gcc 
> optimization at compile time
> * For variable cases, it's important to reduce branches and make good use of 
> hardware prefetch
> 
> Memcpy optimization is summarized below:
> * Utilize full cache bandwidth
> * SSE: 128 bit
> * AVX/AVX2: 128/256 bit, depends on hardware implementation
> * Enforce aligned stores
> * Apply load address alignment based on architecture features
> * Enforce aligned loads for Sandy Bridge like architectures
> * No need to enforce aligned loads for Haswell because unaligned loads is 
> improved, also the AVX2 VPALIGNR is not efficient for 256-bit shifting and 
> leads to extra overhead
> * Make load/store address available as early as possible
> 
> Finally, general optimization techniques should be applied, like inlining, 
> branch reducing, prefetch pattern access, etc.
> 

Is this optimization in compile time or run time?

> 3.3 Code change
> 
> DPDK memcpy is implemented in a standalone file "rte_memcpy.h".
> The memcpy function is "rte_memcpy_func", which contains the copy flow, and 
> calls the inline move functions for actual data copy.
> 
> There will be major code change described as follows:
> * Differentiate architectural features based on CPU flags
> * Implement separated copy flow specifically optimized for target 
> architecture
> * Implement separated move functions for SSE/AVX/AVX2 to make full 
> utilization of cache bandwidth
> * Rewrite the memcpy function "rte_memcpy_func"
> * Add store aligning
> * Add load aligning for Sandy Bridge and older architectures
> * Put block copy loop into inline move functions for better cont

[dpdk-dev] dpdk-pktgen how to show more than 4 ports in one page?

2016-08-01 Thread linhaifeng

 2016/7/30 21:30, Wiles, Keith :
>> On Jul 30, 2016, at 1:03 AM, linhaifeng  wrote:
>>
>> hi
>>
>> I use 6 ports to send pkts in VM, but can only 4 ports work, how to enable 
>> more ports to work?
>>
> In the help screen the command ?ppp [1-6]? is ports per page.
> 
> 
Thank you very much.

It works very good with 5 ports per page and the UI is very beautiful.
Thers is a probrem with 6 ports per page in my view -- The last column is show 
replace of the first column.

BTW is it support show latency and jitter ?
is it support save the result in pdf of excel?

[dpdk-dev] [PATCH] cycles: add isb before read cntvct_el0

2020-03-09 Thread Linhaifeng

We nead isb rather than dsb to sync system counter to cntvct_el0.

Signed-off-by: Haifeng Lin 
---
 lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 3 +++
 lib/librte_eal/common/include/arch/arm/rte_cycles_64.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h 
b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
index 859ae129d..705351394 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
@@ -21,6 +21,7 @@ extern "C" {
 
 #define dsb(opt) asm volatile("dsb " #opt : : : "memory")
 #define dmb(opt) asm volatile("dmb " #opt : : : "memory")
+#define isb()asm volatile("isb" : : : "memory")
 
 #define rte_mb() dsb(sy)
 
@@ -186,6 +187,8 @@ rte_atomic128_cmp_exchange(rte_int128_t *dst, rte_int128_t 
*exp,
return (old.int128 == expected.int128);
 }
 
+#define rte_isb() isb()
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h 
b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
index 68e7c7338..29f524901 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
@@ -18,6 +18,7 @@ extern "C" {
  *   The time base for this lcore.
  */
 #ifndef RTE_ARM_EAL_RDTSC_USE_PMU
+
 /**
  * This call is portable to any ARMv8 architecture, however, typically
  * cntvct_el0 runs at <= 100MHz and it may be imprecise for some tasks.
@@ -27,6 +28,7 @@ rte_rdtsc(void)
 {
uint64_t tsc;
 
+   rte_isb();
asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
return tsc;
 }
-- 
2.24.1.windows.2

[dpdk-dev] [PATCH] cycles: add isb before read cntvct_el0

2020-03-09 Thread Linhaifeng

We should use isb rather than dsb to sync system counter to cntvct_el0.

Signed-off-by: Haifeng Lin 
---
lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 3 +++
lib/librte_eal/common/include/arch/arm/rte_cycles_64.h | 2 ++
2 files changed, 5 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h 
b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
index 859ae129d..7e8049725 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
@@ -21,6 +21,7 @@ extern "C" {
 #define dsb(opt) asm volatile("dsb " #opt : : : "memory")
#define dmb(opt) asm volatile("dmb " #opt : : : "memory")
+#define isb()asm volatile("isb" : : : "memory")
 #define rte_mb() dsb(sy)
@@ -44,6 +45,8 @@ extern "C" {
 #define rte_cio_rmb() dmb(oshld)
+#define rte_isb() isb()
+
/* 128 bit atomic operations -*/
 #if defined(__ARM_FEATURE_ATOMICS) || defined(RTE_ARM_FEATURE_ATOMICS)
diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h 
b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
index 68e7c7338..29f524901 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
@@ -18,6 +18,7 @@ extern "C" {
  *   The time base for this lcore.
  */
#ifndef RTE_ARM_EAL_RDTSC_USE_PMU
+
/**
  * This call is portable to any ARMv8 architecture, however, typically
  * cntvct_el0 runs at <= 100MHz and it may be imprecise for some tasks.
@@ -27,6 +28,7 @@ rte_rdtsc(void)
{
   uint64_t tsc;
+   rte_isb();
   asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
   return tsc;
}
--

[dpdk-dev] [PATCH] cycles: add isb before read cntvct_el0

2020-03-09 Thread Linhaifeng

We should use isb rather than dsb to sync system counter to cntvct_el0.

Signed-off-by: Haifeng Lin 
---
 lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 3 +++
 lib/librte_eal/common/include/arch/arm/rte_cycles_64.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h 
b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
index 859ae129d..2587f98a2 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
@@ -21,6 +21,7 @@ extern "C" {
 
 #define dsb(opt) asm volatile("dsb " #opt : : : "memory")
 #define dmb(opt) asm volatile("dmb " #opt : : : "memory")
+#define isb()(asm volatile("isb" : : : "memory"))
 
 #define rte_mb() dsb(sy)
 
@@ -44,6 +45,8 @@ extern "C" {
 
 #define rte_cio_rmb() dmb(oshld)
 
+#define rte_isb() isb()
+
 /* 128 bit atomic operations 
-*/
 
 #if defined(__ARM_FEATURE_ATOMICS) || defined(RTE_ARM_FEATURE_ATOMICS)
diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h 
b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
index 68e7c7338..29f524901 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
@@ -18,6 +18,7 @@ extern "C" {
  *   The time base for this lcore.
  */
 #ifndef RTE_ARM_EAL_RDTSC_USE_PMU
+
 /**
  * This call is portable to any ARMv8 architecture, however, typically
  * cntvct_el0 runs at <= 100MHz and it may be imprecise for some tasks.
@@ -27,6 +28,7 @@ rte_rdtsc(void)
 {
uint64_t tsc;
 
+   rte_isb();
asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
return tsc;
 }
-- 
2.24.1.windows.2

[dpdk-dev] [PATCH] cycles: add isb before read cntvct_el0

2020-03-09 Thread Linhaifeng

We should use isb rather than dsb to sync system counter to cntvct_el0.

Signed-off-by: Linhaifeng 
---
 lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 3 +++
 lib/librte_eal/common/include/arch/arm/rte_cycles_64.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h 
b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
index 859ae129d..2587f98a2 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
@@ -21,6 +21,7 @@ extern "C" {
 
 #define dsb(opt) asm volatile("dsb " #opt : : : "memory")
 #define dmb(opt) asm volatile("dmb " #opt : : : "memory")
+#define isb()(asm volatile("isb" : : : "memory"))
 
 #define rte_mb() dsb(sy)
 
@@ -44,6 +45,8 @@ extern "C" {
 
 #define rte_cio_rmb() dmb(oshld)
 
+#define rte_isb() isb()
+
 /* 128 bit atomic operations 
-*/
 
 #if defined(__ARM_FEATURE_ATOMICS) || defined(RTE_ARM_FEATURE_ATOMICS)
diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h 
b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
index 68e7c7338..29f524901 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
@@ -18,6 +18,7 @@ extern "C" {
  *   The time base for this lcore.
  */
 #ifndef RTE_ARM_EAL_RDTSC_USE_PMU
+
 /**
  * This call is portable to any ARMv8 architecture, however, typically
  * cntvct_el0 runs at <= 100MHz and it may be imprecise for some tasks.
@@ -27,6 +28,7 @@ rte_rdtsc(void)
 {
uint64_t tsc;
 
+   rte_isb();
asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
return tsc;
 }
-- 
2.24.1.windows.2

[dpdk-dev] 答复: [PATCH] cycles: add isb before read cntvct_el0

2020-03-09 Thread Linhaifeng

-邮件原件-
发件人: Jerin Jacob [mailto:jerinjac...@gmail.com] 
发送时间: 2020年3月9日 23:43
收件人: Linhaifeng 
抄送: dev@dpdk.org; tho...@monjalon.net; Lilijun (Jerry) 
; chenchanghu ; xudingke 

主题: Re: [dpdk-dev] [PATCH] cycles: add isb before read cntvct_el0

On Mon, Mar 9, 2020 at 2:43 PM Linhaifeng  wrote:
>
> We nead isb rather than dsb to sync system counter to cntvct_el0.

# Currently rte_rdtsc() does not have dsb. Right? or any barriers.
# Why do you need it? If it regarding, getting accurate value then use 
rte_rdtsc_precise().

We use rte_get_tsc_cycles get start_value in pmd1 and end_value in pmd2 in our 
qos module, it works ok in x86 but not ok in arm64.

Then we use rte_mb() to sync instruction but it not work.Because rte_mb is dsb 
I think it only have affect on memory. cntvct_el0 and system counter is 
register so I think we should use isb.

It works well after we use isb in multi core scenes.

Use rte_rdtsc_precise is good idea. Maybe use isb replace of rte_mb(dsb) ?

>
> Signed-off-by: Haifeng Lin

[dpdk-dev] 答复: [PATCH] cycles: add isb before read cntvct_el0

2020-03-09 Thread Linhaifeng



-邮件原件-
发件人: David Marchand [mailto:david.march...@redhat.com] 
发送时间: 2020年3月9日 17:19
收件人: Linhaifeng 
抄送: dev@dpdk.org; tho...@monjalon.net; Lilijun (Jerry) 
; chenchanghu ; xudingke 

主题: Re: [dpdk-dev] [PATCH] cycles: add isb before read cntvct_el0

On Mon, Mar 9, 2020 at 10:14 AM Linhaifeng  wrote:
>
> We nead isb rather than dsb to sync system counter to cntvct_el0.

I'll leave the arm maintainers look at this, but I have a comment on the form.

Thank you

>
> Signed-off-by: Haifeng Lin 
> ---
>  lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 3 +++  
> lib/librte_eal/common/include/arch/arm/rte_cycles_64.h | 2 ++
>  2 files changed, 5 insertions(+)
>
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h 
> b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> index 859ae129d..705351394 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> @@ -21,6 +21,7 @@ extern "C" {
>
>  #define dsb(opt) asm volatile("dsb " #opt : : : "memory")  #define 
> dmb(opt) asm volatile("dmb " #opt : : : "memory")
> +#define isb()asm volatile("isb" : : : "memory")

dsb and dmb should not be exported as public macros in the first place (I 
forgot to send the patch that drops those, will send later).
Please don't add more public macro that make no sense except for
aarch64: neither isb, nor rte_isb.


Ok.I will send a new patch after yours.


>
>  #define rte_mb() dsb(sy)
>
> @@ -186,6 +187,8 @@ rte_atomic128_cmp_exchange(rte_int128_t *dst, 
> rte_int128_t *exp,
> return (old.int128 == expected.int128);  }
>
> +#define rte_isb() isb()
> +
>  #ifdef __cplusplus
>  }
>  #endif


--
David Marchand

Re: [dpdk-dev] [PATCH] cycles: add isb before read cntvct_el0

2020-03-10 Thread Linhaifeng




> -Original Message-
> From: Gavin Hu [mailto:gavin...@arm.com]
> Sent: Tuesday, March 10, 2020 3:11 PM
> To: Linhaifeng ; dev@dpdk.org;
> tho...@monjalon.net
> Cc: chenchanghu ; xudingke
> ; Lilijun (Jerry) ; Honnappa
> Nagarahalli ; Steve Capper
> ; nd 
> Subject: RE: [PATCH] cycles: add isb before read cntvct_el0
> 
> Hi Haifeng,
> 
> > -Original Message-
> > From: dev  On Behalf Of Linhaifeng
> > Sent: Monday, March 9, 2020 5:23 PM
> > To: dev@dpdk.org; tho...@monjalon.net
> > Cc: chenchanghu ; xudingke
> > ; Lilijun (Jerry) 
> > Subject: [dpdk-dev] [PATCH] cycles: add isb before read cntvct_el0
> >
> > We should use isb rather than dsb to sync system counter to cntvct_el0.
> >
> > Signed-off-by: Haifeng Lin 
> > ---
> > lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 3 +++
> > lib/librte_eal/common/include/arch/arm/rte_cycles_64.h | 2 ++
> > 2 files changed, 5 insertions(+)
> >
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > index 859ae129d..7e8049725 100644
> > --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > @@ -21,6 +21,7 @@ extern "C" {
> >  #define dsb(opt) asm volatile("dsb " #opt : : : "memory") #define
> > dmb(opt) asm volatile("dmb " #opt : : : "memory")
> > +#define isb()asm volatile("isb" : : : "memory")
> >  #define rte_mb() dsb(sy)
> > @@ -44,6 +45,8 @@ extern "C" {
> >  #define rte_cio_rmb() dmb(oshld)
> > +#define rte_isb() isb()
> > +
> > /* 128 bit atomic operations
> > -*/  #if defined(__ARM_FEATURE_ATOMICS) ||
> > defined(RTE_ARM_FEATURE_ATOMICS)
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > index 68e7c7338..29f524901 100644
> > --- a/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_cycles_64.h
> > @@ -18,6 +18,7 @@ extern "C" {
> >   *   The time base for this lcore.
> >   */
> > #ifndef RTE_ARM_EAL_RDTSC_USE_PMU
> > +
> > /**
> >   * This call is portable to any ARMv8 architecture, however, typically
> >   * cntvct_el0 runs at <= 100MHz and it may be imprecise for some tasks.
> > @@ -27,6 +28,7 @@ rte_rdtsc(void)
> > {
> >uint64_t tsc;
> > +   rte_isb();
> Good catch, could you add a link to the commit log as a reference.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/ar
> m64/include/asm/arch_timer.h?h=v5.5#n220
> 

Ok.

> >asm volatile("mrs %0, cntvct_el0" : "=r" (tsc));
> In kernel, there is a call to arch_counter_enforce_ordering(cnt), maybe it is
> also necessary.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/ar
> m64/include/asm/arch_timer.h?h=v5.5#n168

Should we add isb and arch_counter_enforce_ordering in rte_rdtsc or 
rte_rdtsc_precise?

> >return tsc;
> > }
> > --

1 2 >

1 - 100 of 121 matches

Mail list logo