Hi Aaron,

Thanks for the report.  I haven't been able to reproduce the problem, could
you post ovs-vswitch.log please?

Daniele

2016-07-19 9:38 GMT-07:00 Aaron Conole <acon...@redhat.com>:

> I haven't fully had a chance to investigate this, but it seems there's a
> possible deadlock with the dpdk netdev - specifically vhostuser class.
> I've seen it intermittently with 2.4, 2.5, and now master.  It's quite
> rare to reproduce.
>
> Below is a sample stack trace.  In it, notice that thread 12
> (dpdk_watchdog) is blocked in a pthread_mutex_unlock; the mutex object
> itself is indicating unlocked status.  I don't know how exactly this
> would happen: guess is a possible removal of the device while the
> dpdk_watchdog is running, but that's just a guess.  It has been stuck on
> this unlock since last night, on the same device (which no longer has an
> entry in the database).  I need more time to investigate, but I have two
> patch sets I'm working on already, and can't take on another, yet.
>
> I probably won't get to this in the next month or two for other reasons,
> so I'm posting it here so it doesn't get lost (and also because someone
> may have or see a fix immediately).
>
> [root@wsfd-netdev9 ovs]# gdb ./vswitchd/ovs-vswitchd
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /root/git/ovs/vswitchd/ovs-vswitchd...done.
> (gdb) attach 6041
> Attaching to program: /root/git/ovs/./vswitchd/ovs-vswitchd, process 6041
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> [New LWP 6065]
> [New LWP 6064]
> [New LWP 6063]
> [New LWP 6062]
> [New LWP 6061]
> [New LWP 6060]
> [New LWP 6059]
> [New LWP 6058]
> [New LWP 6045]
> [New LWP 6044]
> [New LWP 6043]
> [New LWP 6042]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/librt.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libm.so.6...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libnuma.so.1...Reading symbols from
> /lib64/libnuma.so.1...(no debugging symbols found)...done.
> (no debugging symbols found)...done.
> Loaded symbols for /lib64/libnuma.so.1
> Reading symbols from /lib64/libc.so.6...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libgcc_s.so.1
> 0x00007f0e471b6b7d in poll () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.17-105.el7.x86_64 libgcc-4.8.5-4.el7.x86_64
> numactl-libs-2.0.9-6.el7_2.x86_64
> (gdb) thread apply all bt
>
> Thread 13 (Thread 0x7f0e055f9700 (LWP 6042)):
> #0  0x00007f0e471c17a3 in epoll_wait () from /lib64/libc.so.6
> #1  0x000000000052b7b4 in eal_intr_thread_main ()
> #2  0x00007f0e47ba9dc5 in start_thread () from /lib64/libpthread.so.0
> #3  0x00007f0e471c11cd in clone () from /lib64/libc.so.6
>
> Thread 12 (Thread 0x7f0e04df8700 (LWP 6043)):
> #0  0x00007f0e47bacbdc in pthread_mutex_unlock () from
> /lib64/libpthread.so.0
> #1  0x000000000062d038 in ovs_mutex_unlock (l_=l_@entry=0x7f0e07574ca8)
>     at lib/ovs-thread.c:125
> #2  0x0000000000671b28 in dpdk_watchdog (dummy=<optimized out>)
>     at lib/netdev-dpdk.c:570
> #3  0x000000000062cdd4 in ovsthread_wrapper (aux_=<optimized out>)
>     at lib/ovs-thread.c:342
> #4  0x00007f0e47ba9dc5 in start_thread () from /lib64/libpthread.so.0
> #5  0x00007f0e471c11cd in clone () from /lib64/libc.so.6
>
> Thread 11 (Thread 0x7f0e045f7700 (LWP 6044)):
> #0  0x00007f0e471b88f3 in select () from /lib64/libc.so.6
> #1  0x0000000000543b49 in fdset_event_dispatch ()
> #2  0x0000000000542fde in rte_vhost_driver_session_start ()
> #3  0x000000000066f01b in start_vhost_loop (dummy=<optimized out>)
> ---Type <return> to continue, or q <return> to quit---
>     at lib/netdev-dpdk.c:2451
> #4  0x000000000062cdd4 in ovsthread_wrapper (aux_=<optimized out>)
>     at lib/ovs-thread.c:342
> #5  0x00007f0e47ba9dc5 in start_thread () from /lib64/libpthread.so.0
> #6  0x00007f0e471c11cd in clone () from /lib64/libc.so.6
>
> Thread 10 (Thread 0x7f0e03df6700 (LWP 6045)):
> #0  0x00007f0e471b6b7d in poll () from /lib64/libc.so.6
> #1  0x000000000064f9c4 in time_poll (pollfds=pollfds@entry=0x7f0df80008c0,
>     n_pollfds=2, handles=handles@entry=0x0,
> timeout_when=9223372036854775807,
>     elapsed=elapsed@entry=0x7f0e03df5aac) at lib/timeval.c:305
> #2  0x000000000063cecc in poll_block () at lib/poll-loop.c:364
> #3  0x000000000062bef1 in ovsrcu_postpone_thread (arg=<optimized out>)
>     at lib/ovs-rcu.c:310
> #4  0x000000000062cdd4 in ovsthread_wrapper (aux_=<optimized out>)
>     at lib/ovs-thread.c:342
> #5  0x00007f0e47ba9dc5 in start_thread () from /lib64/libpthread.so.0
> #6  0x00007f0e471c11cd in clone () from /lib64/libc.so.6
>
> Thread 9 (Thread 0x7f0e01df2700 (LWP 6058)):
> #0  0x00007f0e471b6b7d in poll () from /lib64/libc.so.6
> #1  0x000000000064f9c4 in time_poll (pollfds=pollfds@entry=0x7f0df00049a0,
>     n_pollfds=3, handles=handles@entry=0x0,
> timeout_when=9223372036854775807,
> ---Type <return> to continue, or q <return> to quit---
>     elapsed=elapsed@entry=0x7f0e01df1a8c) at lib/timeval.c:305
> #2  0x000000000063cecc in poll_block () at lib/poll-loop.c:364
> #3  0x00000000005a0c8f in udpif_upcall_handler (arg=0x10afbe0)
>     at ofproto/ofproto-dpif-upcall.c:725
> #4  0x000000000062cdd4 in ovsthread_wrapper (aux_=<optimized out>)
>     at lib/ovs-thread.c:342
> #5  0x00007f0e47ba9dc5 in start_thread () from /lib64/libpthread.so.0
> #6  0x00007f0e471c11cd in clone () from /lib64/libc.so.6
>
> Thread 8 (Thread 0x7f0e025f3700 (LWP 6059)):
> #0  0x00007f0e471b6b7d in poll () from /lib64/libc.so.6
> #1  0x000000000064f9c4 in time_poll (pollfds=pollfds@entry=0x7f0dd80049a0,
>     n_pollfds=3, handles=handles@entry=0x0,
> timeout_when=9223372036854775807,
>     elapsed=elapsed@entry=0x7f0e025f2a8c) at lib/timeval.c:305
> #2  0x000000000063cecc in poll_block () at lib/poll-loop.c:364
> #3  0x00000000005a0c8f in udpif_upcall_handler (arg=0x10afbf8)
>     at ofproto/ofproto-dpif-upcall.c:725
> #4  0x000000000062cdd4 in ovsthread_wrapper (aux_=<optimized out>)
>     at lib/ovs-thread.c:342
> #5  0x00007f0e47ba9dc5 in start_thread () from /lib64/libpthread.so.0
> #6  0x00007f0e471c11cd in clone () from /lib64/libc.so.6
>
> Thread 7 (Thread 0x7f0e02df4700 (LWP 6060)):
> ---Type <return> to continue, or q <return> to quit---
> #0  0x00007f0e471b6b7d in poll () from /lib64/libc.so.6
> #1  0x000000000064f9c4 in time_poll (pollfds=pollfds@entry=0x7f0dec000c50,
>     n_pollfds=4, handles=handles@entry=0x0, timeout_when=65562769,
>     elapsed=elapsed@entry=0x7f0e02df3a4c) at lib/timeval.c:305
> #2  0x000000000063cecc in poll_block () at lib/poll-loop.c:364
> #3  0x00000000005a041b in udpif_revalidator (arg=0x10afb20)
>     at ofproto/ofproto-dpif-upcall.c:921
> #4  0x000000000062cdd4 in ovsthread_wrapper (aux_=<optimized out>)
>     at lib/ovs-thread.c:342
> #5  0x00007f0e47ba9dc5 in start_thread () from /lib64/libpthread.so.0
> #6  0x00007f0e471c11cd in clone () from /lib64/libc.so.6
>
> Thread 6 (Thread 0x7f0e035f5700 (LWP 6061)):
> #0  0x00007f0e471b6b7d in poll () from /lib64/libc.so.6
> #1  0x000000000064f9c4 in time_poll (pollfds=pollfds@entry=0x7f0ddc0008e0,
>     n_pollfds=2, handles=handles@entry=0x0,
> timeout_when=9223372036854775807,
>     elapsed=elapsed@entry=0x7f0e035f4a2c) at lib/timeval.c:305
> #2  0x000000000063cecc in poll_block () at lib/poll-loop.c:364
> #3  0x000000000062d7d6 in ovs_barrier_block (barrier=barrier@entry
> =0x107fcf8)
>     at lib/ovs-thread.c:307
> #4  0x00000000005a02e0 in udpif_revalidator (arg=0x10afb38)
>     at ofproto/ofproto-dpif-upcall.c:873
> #5  0x000000000062cdd4 in ovsthread_wrapper (aux_=<optimized out>)
> ---Type <return> to continue, or q <return> to quit---
>     at lib/ovs-thread.c:342
> #6  0x00007f0e47ba9dc5 in start_thread () from /lib64/libpthread.so.0
> #7  0x00007f0e471c11cd in clone () from /lib64/libc.so.6
>
> Thread 5 (Thread 0x7f0deb7fe700 (LWP 6062)):
> #0  0x00007f0e471b6b7d in poll () from /lib64/libc.so.6
> #1  0x000000000064f9c4 in time_poll (pollfds=pollfds@entry=0x7f0dd00008c0,
>     n_pollfds=2, handles=handles@entry=0x0,
> timeout_when=9223372036854775807,
>     elapsed=elapsed@entry=0x7f0deb7fda8c) at lib/timeval.c:305
> #2  0x000000000063cecc in poll_block () at lib/poll-loop.c:364
> #3  0x00000000005a0c8f in udpif_upcall_handler (arg=0x10635c0)
>     at ofproto/ofproto-dpif-upcall.c:725
> #4  0x000000000062cdd4 in ovsthread_wrapper (aux_=<optimized out>)
>     at lib/ovs-thread.c:342
> #5  0x00007f0e47ba9dc5 in start_thread () from /lib64/libpthread.so.0
> #6  0x00007f0e471c11cd in clone () from /lib64/libc.so.6
>
> Thread 4 (Thread 0x7f0debfff700 (LWP 6063)):
> #0  0x00007f0e471b6b7d in poll () from /lib64/libc.so.6
> #1  0x000000000064f9c4 in time_poll (pollfds=pollfds@entry=0x7f0dd40008c0,
>     n_pollfds=2, handles=handles@entry=0x0,
> timeout_when=9223372036854775807,
>     elapsed=elapsed@entry=0x7f0debffea8c) at lib/timeval.c:305
> #2  0x000000000063cecc in poll_block () at lib/poll-loop.c:364
> ---Type <return> to continue, or q <return> to quit---
> #3  0x00000000005a0c8f in udpif_upcall_handler (arg=0x10635d8)
>     at ofproto/ofproto-dpif-upcall.c:725
> #4  0x000000000062cdd4 in ovsthread_wrapper (aux_=<optimized out>)
>     at lib/ovs-thread.c:342
> #5  0x00007f0e47ba9dc5 in start_thread () from /lib64/libpthread.so.0
> #6  0x00007f0e471c11cd in clone () from /lib64/libc.so.6
>
> Thread 3 (Thread 0x7f0e009ef700 (LWP 6064)):
> #0  0x00007f0e471b6b7d in poll () from /lib64/libc.so.6
> #1  0x000000000064f9c4 in time_poll (pollfds=pollfds@entry=0x7f0de4000c50,
>     n_pollfds=4, handles=handles@entry=0x0, timeout_when=65562771,
>     elapsed=elapsed@entry=0x7f0e009eea4c) at lib/timeval.c:305
> #2  0x000000000063cecc in poll_block () at lib/poll-loop.c:364
> #3  0x00000000005a041b in udpif_revalidator (arg=0x1063490)
>     at ofproto/ofproto-dpif-upcall.c:921
> #4  0x000000000062cdd4 in ovsthread_wrapper (aux_=<optimized out>)
>     at lib/ovs-thread.c:342
> #5  0x00007f0e47ba9dc5 in start_thread () from /lib64/libpthread.so.0
> #6  0x00007f0e471c11cd in clone () from /lib64/libc.so.6
>
> Thread 2 (Thread 0x7f0e011f0700 (LWP 6065)):
> #0  0x00007f0e471b6b7d in poll () from /lib64/libc.so.6
> #1  0x000000000064f9c4 in time_poll (pollfds=pollfds@entry=0x7f0dcc0008e0,
> ---Type <return> to continue, or q <return> to quit---
>     n_pollfds=2, handles=handles@entry=0x0,
> timeout_when=9223372036854775807,
>     elapsed=elapsed@entry=0x7f0e011efa2c) at lib/timeval.c:305
> #2  0x000000000063cecc in poll_block () at lib/poll-loop.c:364
> #3  0x000000000062d7d6 in ovs_barrier_block (barrier=barrier@entry
> =0x10a39c8)
>     at lib/ovs-thread.c:307
> #4  0x00000000005a02e0 in udpif_revalidator (arg=0x10634a8)
>     at ofproto/ofproto-dpif-upcall.c:873
> #5  0x000000000062cdd4 in ovsthread_wrapper (aux_=<optimized out>)
>     at lib/ovs-thread.c:342
> #6  0x00007f0e47ba9dc5 in start_thread () from /lib64/libpthread.so.0
> #7  0x00007f0e471c11cd in clone () from /lib64/libc.so.6
>
> Thread 1 (Thread 0x7f0e481d7b40 (LWP 6041)):
> #0  0x00007f0e471b6b7d in poll () from /lib64/libc.so.6
> #1  0x000000000064f9c4 in time_poll (pollfds=pollfds@entry=0x10ab540,
>     n_pollfds=2, handles=handles@entry=0x0, timeout_when=66006888,
>     elapsed=elapsed@entry=0x7ffcbb24145c) at lib/timeval.c:305
> #2  0x000000000063cecc in poll_block () at lib/poll-loop.c:364
> #3  0x000000000062bc63 in ovsrcu_synchronize () at lib/ovs-rcu.c:230
> #4  0x00000000005a9240 in xlate_txn_commit ()
>     at ofproto/ofproto-dpif-xlate.c:909
> #5  0x00000000005955fb in type_run (type=<optimized out>)
>     at ofproto/ofproto-dpif.c:646
> ---Type <return> to continue, or q <return> to quit---
> #6  0x000000000058353f in ofproto_type_run (datapath_type=<optimized out>,
>     datapath_type@entry=0x10773f0 "netdev") at ofproto/ofproto.c:1706
> #7  0x0000000000573a25 in bridge_run__ () at vswitchd/bridge.c:2891
> #8  0x000000000057865e in bridge_reconfigure (ovs_cfg=ovs_cfg@entry
> =0x10a8c30)
>     at vswitchd/bridge.c:680
> #9  0x0000000000579b14 in bridge_run () at vswitchd/bridge.c:2976
> #10 0x000000000040fed5 in main (argc=11, argv=0x7ffcbb241aa8)
>     at vswitchd/ovs-vswitchd.c:112
> (gdb) p mutex
> $1 = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0,
>       __kind = 2, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
>     __size = '\000' <repeats 16 times>, "\002", '\000' <repeats 22 times>,
>     __align = 0}, where = 0x6d8117 "<unlocked>"}
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to