On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote:
> Hi,
> thanks for the patch.
> I backported it this way to my DPDK 2.2 based environment for now (see below):
> 
> With that applied one (and only one) of my two guests looses connectivity 
> after
> removing the ports the first time.

Yeah, that's should be because I invoked the "->destroy_device()"
callback.

BTW, I'm curious how do you do the test? I saw you added 256 ports, but
with 2 guests only? So, 254 of them are idle, just for testing the
memory leak bug?

And then you remove all of them, without stopping the guest. How that's
gonna work? I mean, the vhost-user connection would be broken, and data
flow would not work.

        --yliu

> No traffic seems to pass, setting the device in the guest down/up doesn't get
> anything.
> But It isn't totally broken - stopping/starting the guest gets it working
> again.
> So openvswitch/dpdk is still somewhat working - it just seems the guest lost
> something, after tapping on that vhost_user interface again it works.
> 
> I will check tomorrow and let you know:
> - if I'm more lucky with that patch on top of 16.04
> - if it looses connectivity after the first or a certain amount of port 
> removes
> 
> If you find issues with my backport adaption let me know.
> 
> 
> ---
> 
> Backport and reasoning:
> 
> new fix relies on a lot of new code, vhost_destroy_device looks totally
> different from the former destroy_device.
> History on todays function content:
> ? 4796ad63 - original code moved from examples to lib
> ? a90ca1a1 - this replaces ops->destroy_device with vhost_destroy_device
> ? 71dc571e - simple check against null pointers
> ? 45ca9c6f - this changed the code from linked list to arrays
> 
> ? New code cleans with:
> ? ? ? notify_ops->destroy_device (callback into the parent)
> ? ? ? cleanup_device was existing before even in 2.2 code
> ? ? ? free_device as well existing before even in 2.2 code
> ? Old code cleans with:
> ? ? ? notify_ops->destroy_device - still there
> ? ? ? rm_config_ll_entry -> eventually calls cleanup_device and free_device
> ? ? ? ? (just in the more complex linked list way)
> 
> So the "only" adaption for backporting needed is to replace
> vhost_destroy_device
> with ops->destroy_device(ctx)
> 
> Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c
> ===================================================================
> --- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.c
> +++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -310,6 +310,7 @@ vserver_new_vq_conn(int fd, void *dat, _
> ? }
> ?
> ? vdev_ctx.fh = fh;
> + vserver->fh = fh;
> ? size = strnlen(vserver->path, PATH_MAX);
> ? ops->set_ifname(vdev_ctx, vserver->path,
> ? size);
> @@ -516,6 +517,11 @@ rte_vhost_driver_unregister(const char *
> ?
> ? for (i = 0; i < g_vhost_server.vserver_cnt; i++) {
> ? if (!strcmp(g_vhost_server.server[i]->path, path)) {
> + struct vhost_device_ctx ctx;
> +
> + ctx.fh = g_vhost_server.server[i]->fh;
> + ops->destroy_device(ctx);
> +
> ? fdset_del(&g_vhost_server.fdset,
> ? g_vhost_server.server[i]->listenfd);
> ?
> Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h
> ===================================================================
> --- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.h
> +++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h
> @@ -43,6 +43,7 @@
> ?struct vhost_server {
> ? char *path; /**< The path the uds is bind to. */
> ? int listenfd; ? ? /**< The listener sockfd. */
> + uint32_t fh;
> ?};
> ?
> ?/* refer to hw/virtio/vhost-user.c */
> 
> 
> 
> 
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
> 
> On Mon, Apr 18, 2016 at 8:14 PM, Yuanhan Liu <yuanhan.liu at linux.intel.com>
> wrote:
> 
>     On Mon, Apr 18, 2016 at 10:46:50AM -0700, Yuanhan Liu wrote:
>     > On Mon, Apr 18, 2016 at 07:18:05PM +0200, Christian Ehrhardt wrote:
>     > > I assume there is a leak somewhere on adding/removing vhost_user 
> ports.
>     > > Although it could also be "only" a fragmentation issue.
>     > >
>     > > Reproduction is easy:
>     > > I set up a pair of nicely working OVS-DPDK connected KVM Guests.
>     > > Then in a loop I
>     > >? ? - add up to more 512 ports
>     > >? ? - test connectivity between the two guests
>     > >? ? - remove up to 512 ports
>     > >
>     > > Depending on memory and the amount of multiqueue/rxq I use it seems to
>     > > slightly change when exactly it breaks. But for my default setup of 4
>     > > queues and 5G Hugepages initialized by DPDK it always breaks at the
>     sixth
>     > > iteration.
>     > > Here a link to the stack trace indicating a memory shortage (TBC):
>     > > https://launchpadlibrarian.net/253916410/apport-retrace.log
>     > >
>     > > Known Todos:
>     > > - I want to track it down more, and will try to come up with a non
>     > > openvswitch based looping testcase that might show it as well to
>     simplify
>     > > debugging.
>     > > - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 
> 16.04
>     and
>     > > Openvswitch master is planned.
>     > >
>     > > I will go on debugging this and let you know, but I wanted to give a
>     heads
>     > > up to everyone.
>     >
>     > Thanks for the report.
>     >
>     > > In case this is a known issue for some of you please let me know.
>     >
>     > Yeah, it might be. I'm wondering that virtio_net struct is not freed.
>     > It will be freed only (if I'm not mistaken) when guest quits, by far.
> 
>     Would you try following diff and to see if it fix your issue?
> 
>     ? ? ? ? --yliu
> 
>     ---
>     ?lib/librte_vhost/vhost_user/vhost-net-user.c | 6 ++++++
>     ?lib/librte_vhost/vhost_user/vhost-net-user.h | 1 +
>     ?2 files changed, 7 insertions(+)
> 
>     diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/
>     librte_vhost/vhost_user/vhost-net-user.c
>     index df2bd64..8f7ebd7 100644
>     --- a/lib/librte_vhost/vhost_user/vhost-net-user.c
>     +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
>     @@ -309,6 +309,7 @@ vserver_new_vq_conn(int fd, void *dat, __rte_unused 
> int
>     *remove)
>     ? ? ? ? }
> 
>     ? ? ? ? vdev_ctx.fh = fh;
>     +? ? ? ?vserver->fh = fh;
>     ? ? ? ? size = strnlen(vserver->path, PATH_MAX);
>     ? ? ? ? vhost_set_ifname(vdev_ctx, vserver->path,
>     ? ? ? ? ? ? ? ? size);
>     @@ -501,6 +502,11 @@ rte_vhost_driver_unregister(const char *path)
> 
>     ? ? ? ? for (i = 0; i < g_vhost_server.vserver_cnt; i++) {
>     ? ? ? ? ? ? ? ? if (!strcmp(g_vhost_server.server[i]->path, path)) {
>     +? ? ? ? ? ? ? ? ? ? ? ?struct vhost_device_ctx ctx;
>     +
>     +? ? ? ? ? ? ? ? ? ? ? ?ctx.fh = g_vhost_server.server[i]->fh;
>     +? ? ? ? ? ? ? ? ? ? ? ?vhost_destroy_device(ctx);
>     +
>     ? ? ? ? ? ? ? ? ? ? ? ? fdset_del(&g_vhost_server.fdset,
>     ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? g_vhost_server.server[i]->listenfd);
> 
>     diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/
>     librte_vhost/vhost_user/vhost-net-user.h
>     index e3bb413..7cf21db 100644
>     --- a/lib/librte_vhost/vhost_user/vhost-net-user.h
>     +++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
>     @@ -43,6 +43,7 @@
>     ?struct vhost_server {
>     ? ? ? ? char *path; /**< The path the uds is bind to. */
>     ? ? ? ? int listenfd;? ? ?/**< The listener sockfd. */
>     +? ? ? ?uint32_t fh;
>     ?};
> 
>     ?/* refer to hw/virtio/vhost-user.c */
>     --
>     1.9.3
> 
> 
> 
> 

Reply via email to