Hi Patrick, On Tue, Apr 5, 2016 at 10:40 AM, Patrik Andersson R < patrik.r.andersson at ericsson.com> wrote: > > The described fault situation arises due to the fact that there is a bug > in an OpenStack component, Neutron or Nova, that fails to release ports > on VM deletion. This typically leads to an accumulation of 1-2 file > descriptors per unreleased port. It could also arise when allocating a > large > number (~500?) of vhost user ports and connecting them all to VMs. >
I can confirm that I'm able to trigger this without Openstack. Using DPDK 2.2 and OpenVswitch 2.5. Initially I had at least 2 guests attached to the first two ports, but it seems not necessary which makes it as easy as: ovs-vsctl add-br ovsdpdkbr0 -- set bridge ovsdpdkbr0 datapath_type=netdev ovs-vsctl add-port ovsdpdkbr0 dpdk0 -- set Interface dpdk0 type=dpdk for idx in {1..1023}; do ovs-vsctl add-port ovsdpdkbr0 vhost-user-${idx} -- set Interface vhost-user-${idx} type=dpdkvhostuser; done => as soon as the associated fd is >1023 the vhost_user socket gets created, but just afterwards I see the crash mentioned by Patrick #0 0x00007f51cb187518 in __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007f51cb1890ea in __GI_abort () at abort.c:89 #2 0x00007f51cb1c98c4 in __libc_message (do_abort=do_abort at entry=2, fmt=fmt at entry=0x7f51cb2e1584 "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175 #3 0x00007f51cb26af94 in __GI___fortify_fail (msg=<optimized out>, msg at entry=0x7f51cb2e1515 "buffer overflow detected") at fortify_fail.c:37 #4 0x00007f51cb268fa0 in __GI___chk_fail () at chk_fail.c:28 #5 0x00007f51cb26aee7 in __fdelt_chk (d=<optimized out>) at fdelt_chk.c:25 #6 0x00007f51cbd6d665 in fdset_fill (pfdset=0x7f51cc03dfa0 <g_vhost_server+8192>, wfset=0x7f51c78e4a30, rfset=0x7f51c78e49b0) at /build/dpdk-3lQdSB/dpdk-2.2.0/lib/librte_vhost/vhost_user/fd_man.c:110 #7 fdset_event_dispatch (pfdset=pfdset at entry=0x7f51cc03dfa0 <g_vhost_server+8192>) at /build/dpdk-3lQdSB/dpdk-2.2.0/lib/librte_vhost/vhost_user/fd_man.c:243 #8 0x00007f51cbdc1b00 in rte_vhost_driver_session_start () at /build/dpdk-3lQdSB/dpdk-2.2.0/lib/librte_vhost/vhost_user/vhost-net-user.c:525 #9 0x00000000005061ab in start_vhost_loop (dummy=<optimized out>) at ../lib/netdev-dpdk.c:2047 #10 0x00000000004c2c64 in ovsthread_wrapper (aux_=<optimized out>) at ../lib/ovs-thread.c:340 #11 0x00007f51cba346fa in start_thread (arg=0x7f51c78e5700) at pthread_create.c:333 #12 0x00007f51cb2592dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 As Patrick I don't have a "pure" DPDK test yet, but at least OpenStack is our of the scope now which should help. [...] > The key point, I think, is that more than one file descriptor is used per > vhost user device. This means that there is no real relation between the > number of devices and the number of file descriptors in use. Well it is "one per vhost_user device" as far as I've seen, but those are not the only fd's used overall. [...] > In my opinion the problem is that the assumption: number of vhost > user device == number of file descriptors does not hold. What the actual > relation might be hard to determine with any certainty. > I totally agree to that there is no deterministic rule what to expect. The only rule is that #fd certainly always is > #vhost_user devices. In various setup variants I've crossed fd 1024 anywhere between 475 and 970 vhost_user ports. Once the discussion continues and we have an updates version of the patch with some more agreement I hope I can help to test it. Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd