Hi, I did a few more debugging:
2015-01-27T03:45:04.769Z|223223|rconn|DBG|xapi6<->tcp:x.x.x.x:43173: entering ACTIVE 2015-01-27T03:45:09.084Z|223233|rconn|DBG|xapi6<->tcp:x.x.x.x:43173: idle 5 seconds, sending inactivity probe 2015-01-27T03:45:09.084Z|223234|util|EMER|lib/rconn.c:568: assertion version >= 0 && version <= 0xff failed in run_ACTIVE() #7 0x080f6003 in run_ACTIVE (rc=0xb1898b48) at lib/rconn.c:568 568 ovs_assert(version >= 0 && version <= 0xff); #8 rconn_run (rc=0xb1898b48) at lib/rconn.c:659 659 STATES (gdb) p *rc->vconn $2 = { class = 0x81709c0, state = 2, error = 0, allowed_versions = 30, peer_versions = 0, version = 0, recv_any_version = false, name = 0xb1895908 "tcp:x.x.x.x:43173" } Not sure if the version = 0 here is acceptable but seems to be returning -1 by : intvconn_get_version(const struct vconn *vconn){ return vconn->version ? vconn->version : -1;} Should version = 0 really pass on the assertion ? (version >= 0 && version <= 0xff) Is this some sort of bug ? On the ovs-ofctl I get this: Jan 27 01:43:56 ovs-ofctl: ovs|00001|hmap|DBG|../lib/shash.c:112: 7 nodes in bucket (16 nodes, 8 buckets) Jan 27 01:43:56 ovs-ofctl: ovs|00002|hmap|DBG|../lib/shash.c:112: 7 nodes in bucket (64 nodes, 32 buckets) Jan 27 01:43:56 ovs-ofctl: ovs|00003|ofctl|DBG|connecting to tcp:y.y.y.y:zzzz Jan 27 01:43:56 ovs-ofctl: ovs|00004|poll_loop|DBG|wakeup due to 0-ms timeout Jan 27 01:43:56 ovs-ofctl: ovs|00005|poll_loop|DBG|wakeup due to [POLLOUT] on fd 8 (x.x.x.x:43173<->y.y.y.y:zzzz) at ../lib/stream-fd-unix.c:120 Jan 27 01:43:56 ovs-ofctl: ovs|00006|hmap|DBG|../lib/ofp-msgs.c:1082: 6 nodes in bucket (128 nodes, 64 buckets) Jan 27 01:43:56 ovs-ofctl: ovs|00007|hmap|DBG|../lib/ofp-msgs.c:1082: 6 nodes in bucket (256 nodes, 128 buckets) Jan 27 01:43:56 ovs-ofctl: ovs|00008|hmap|DBG|../lib/ofp-msgs.c:1082: 7 nodes in bucket (512 nodes, 256 buckets) Jan 27 01:43:56 ovs-ofctl: ovs|00009|hmap|DBG|../lib/ofp-msgs.c:1082: 8 nodes in bucket (512 nodes, 256 buckets) Jan 27 01:43:56 ovs-ofctl: ovs|00010|hmap|DBG|../lib/ofp-msgs.c:1082: 6 nodes in bucket (512 nodes, 256 buckets) Jan 27 01:43:56 ovs-ofctl: ovs|00011|hmap|DBG|../lib/ofp-msgs.c:1082: 7 nodes in bucket (512 nodes, 256 buckets) Jan 27 01:43:56 ovs-ofctl: ovs|00012|hmap|DBG|../lib/ofp-msgs.c:1082: 7 nodes in bucket (1024 nodes, 512 buckets) Jan 27 01:43:56 ovs-ofctl: ovs|00013|hmap|DBG|../lib/ofp-msgs.c:1082: 6 nodes in bucket (1024 nodes, 512 buckets) Jan 27 01:43:56 ovs-ofctl: ovs|00014|vconn|DBG|tcp:y.y.y.y:zzzz: sent (Success): OFPT_HELLO (xid=0x1): Jan 27 01:43:56 ovs-ofctl: version bitmap: 0x01 Jan 27 01:43:56 ovs-ofctl: ovs|00015|poll_loop|DBG|wakeup due to [POLLIN] on fd 8 (x.x.x.x:43173<->y.y.y.y:zzzz) at ../lib/stream-fd-unix.c:124 Jan 27 01:43:56 ovs-ofctl: ovs|00016|vconn|DBG|tcp:y.y.y.y:zzzz: received: OFPT_HELLO (OF1.3) (xid=0x17daa): Jan 27 01:43:56 ovs-ofctl: version bitmap: 0x01, 0x02, 0x03, 0x04 Jan 27 01:43:56 ovs-ofctl: ovs|00017|vconn|DBG|tcp:y.y.y.y:zzzz: negotiated OpenFlow version 0x01 (we support version 0x01, peer supports version 0x04 and earlier) Jan 27 01:43:56 ovs-ofctl: ovs|00018|vconn|DBG|tcp:y.y.y.y:zzzz: sent (Success): OFPT_FLOW_MOD (xid=0x2): ADD priority=25000,dl_src=02:e0:52:2a:ef:24,dl_dst=f2:18:14:f8:fb:7a actions=strip_vlan,output:63 Jan 27 01:43:56 ovs-ofctl: ovs|00019|vconn|DBG|tcp:y.y.y.y:zzzz: sent (Success): OFPT_BARRIER_REQUEST (xid=0x3): Jan 27 01:44:01 ovs-ofctl: ovs|00020|poll_loop|DBG|wakeup due to [POLLIN] on fd 8 (x.x.x.x:43173<->y.y.y.y:zzzz) at ../lib/stream-fd-unix.c:124 (0% CPU usage) ovs-ofctl: talking to tcp:y.y.y.y:zzzz (End of file) Thanks, On Thu, Jan 22, 2015 at 11:47 AM, Luiz Henrique Ozaki <luiz.oz...@gmail.com> wrote: > Hi all, > > I have some instances running on OVS with the XenServer and the > ovs-vswitchd process have been crashing for me: > ovs-vswitchd: monitoring pid 19196 (10 crashes: pid 11819 died, killed > (Aborted)) > > I've got this on the log: > 23159|util|EMER|lib/rconn.c:568: assertion version >= 0 && version <= 0xff > failed in run_ACTIVE() > > I did a backtrace from the coredump, but I couldn't figure it out why I'm > getting this assertion failed. > > Here is a backtrace from the 2.3.0 version: > (gdb) thread apply all bt > > Thread 11 (Thread 0xb7725b90 (LWP 20243)): > #0 0xb7728424 in __kernel_vsyscall () > #1 0xb73d9ff2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib/libpthread.so.0 > #2 0xb74cbbb4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /lib/libc.so.6 > #3 0xb7571199 in handle_fildes_io () from /lib/librt.so.1 > #4 0xb73d5912 in start_thread () from /lib/libpthread.so.0 > #5 0xb74bf4ae in clone () from /lib/libc.so.6 > > Thread 10 (Thread 0xb48feb90 (LWP 12169)): > #0 0xb7728424 in __kernel_vsyscall () > #1 0xb74b56c3 in poll () from /lib/libc.so.6 > #2 0x08102166 in time_poll (pollfds=0xb6704538, n_pollfds=3, handles=0x0, > timeout_when=9223372036854775807, elapsed=0xb48e6eb8) at lib/timeval.c:301 > #3 0x080f3f5a in poll_block () at lib/poll-loop.c:314 > #4 0x08075aa1 in udpif_upcall_handler (arg=0x95f6490) at > ofproto/ofproto-dpif-upcall.c:529 > #5 0x080e7058 in ovsthread_wrapper (aux_=0x95a65e0) at > lib/ovs-thread.c:322 > #6 0xb73d5912 in start_thread () from /lib/libpthread.so.0 > #7 0xb74bf4ae in clone () from /lib/libc.so.6 > > Thread 9 (Thread 0xb725eb90 (LWP 12168)): > #0 0xb7728424 in __kernel_vsyscall () > #1 0xb74b56c3 in poll () from /lib/libc.so.6 > #2 0x08102166 in time_poll (pollfds=0xb673b0a8, n_pollfds=3, handles=0x0, > timeout_when=9223372036854775807, elapsed=0xb7246eb8) at lib/timeval.c:301 > #3 0x080f3f5a in poll_block () at lib/poll-loop.c:314 > #4 0x08075aa1 in udpif_upcall_handler (arg=0x95f6484) at > ofproto/ofproto-dpif-upcall.c:529 > #5 0x080e7058 in ovsthread_wrapper (aux_=0x9600d20) at > lib/ovs-thread.c:322 > #6 0xb73d5912 in start_thread () from /lib/libpthread.so.0 > #7 0xb74bf4ae in clone () from /lib/libc.so.6 > > Thread 8 (Thread 0xb5a5bb90 (LWP 12167)): > #0 0xb7728424 in __kernel_vsyscall () > #1 0xb74b56c3 in poll () from /lib/libc.so.6 > #2 0x08102166 in time_poll (pollfds=0x9612c60, n_pollfds=3, handles=0x0, > timeout_when=9223372036854775807, elapsed=0xb5a43eb8) at lib/timeval.c:301 > #3 0x080f3f5a in poll_block () at lib/poll-loop.c:314 > #4 0x08075aa1 in udpif_upcall_handler (arg=0x95f6478) at > ofproto/ofproto-dpif-upcall.c:529 > #5 0x080e7058 in ovsthread_wrapper (aux_=0x9611220) at > lib/ovs-thread.c:322 > #6 0xb73d5912 in start_thread () from /lib/libpthread.so.0 > #7 0xb74bf4ae in clone () from /lib/libc.so.6 > > Thread 7 (Thread 0xb38fcb90 (LWP 12174)): > #0 0xb7728424 in __kernel_vsyscall () > #1 0xb74b56c3 in poll () from /lib/libc.so.6 > #2 0x08102166 in time_poll (pollfds=0xb65281c0, n_pollfds=2, handles=0x0, > timeout_when=9223372036854775807, elapsed=0xb38fad88) at lib/timeval.c:301 > #3 0x080f3f5a in poll_block () at lib/poll-loop.c:314 > #4 0x080e70cb in ovs_barrier_block (barrier=0x95a7bac) at > lib/ovs-thread.c:290 > #5 0x08076d6e in udpif_revalidator (arg=0x95f3a18) at > ofproto/ofproto-dpif-upcall.c:588 > ---Type <return> to continue, or q <return> to quit--- > #6 0x080e7058 in ovsthread_wrapper (aux_=0x96110e0) at > lib/ovs-thread.c:322 > #7 0xb73d5912 in start_thread () from /lib/libpthread.so.0 > #8 0xb74bf4ae in clone () from /lib/libc.so.6 > > Thread 6 (Thread 0xb30fbb90 (LWP 12178)): > #0 0xb7728424 in __kernel_vsyscall () > #1 0xb74b56c3 in poll () from /lib/libc.so.6 > #2 0x08102166 in time_poll (pollfds=0xb6526308, n_pollfds=2, handles=0x0, > timeout_when=9223372036854775807, elapsed=0xb30f9d88) at lib/timeval.c:301 > #3 0x080f3f5a in poll_block () at lib/poll-loop.c:314 > #4 0x080e70cb in ovs_barrier_block (barrier=0x95a7bac) at > lib/ovs-thread.c:290 > #5 0x08076d6e in udpif_revalidator (arg=0x95f3a28) at > ofproto/ofproto-dpif-upcall.c:588 > #6 0x080e7058 in ovsthread_wrapper (aux_=0x9620820) at > lib/ovs-thread.c:322 > #7 0xb73d5912 in start_thread () from /lib/libpthread.so.0 > #8 0xb74bf4ae in clone () from /lib/libc.so.6 > > Thread 5 (Thread 0xb40fdb90 (LWP 12173)): > #0 0xb7728424 in __kernel_vsyscall () > #1 0xb74b56c3 in poll () from /lib/libc.so.6 > #2 0x08102166 in time_poll (pollfds=0xb67119b0, n_pollfds=3, handles=0x0, > timeout_when=10068193, elapsed=0xb40fbdb8) at lib/timeval.c:301 > #3 0x080f3f5a in poll_block () at lib/poll-loop.c:314 > #4 0x080773d8 in udpif_revalidator (arg=0x95f3a08) at > ofproto/ofproto-dpif-upcall.c:629 > #5 0x080e7058 in ovsthread_wrapper (aux_=0x9620038) at > lib/ovs-thread.c:322 > #6 0xb73d5912 in start_thread () from /lib/libpthread.so.0 > #7 0xb74bf4ae in clone () from /lib/libc.so.6 > > Thread 4 (Thread 0xb50ffb90 (LWP 12166)): > #0 0xb7728424 in __kernel_vsyscall () > #1 0xb74b56c3 in poll () from /lib/libc.so.6 > #2 0x08102166 in time_poll (pollfds=0xb6713a20, n_pollfds=3, handles=0x0, > timeout_when=9223372036854775807, elapsed=0xb50e7eb8) at lib/timeval.c:301 > #3 0x080f3f5a in poll_block () at lib/poll-loop.c:314 > #4 0x08075aa1 in udpif_upcall_handler (arg=0x95f646c) at > ofproto/ofproto-dpif-upcall.c:529 > #5 0x080e7058 in ovsthread_wrapper (aux_=0x96110e0) at > lib/ovs-thread.c:322 > #6 0xb73d5912 in start_thread () from /lib/libpthread.so.0 > #7 0xb74bf4ae in clone () from /lib/libc.so.6 > > Thread 3 (Thread 0xb28fab90 (LWP 12165)): > #0 0xb7728424 in __kernel_vsyscall () > #1 0xb73dc839 in __lll_lock_wait () from /lib/libpthread.so.0 > #2 0xb73d7e9f in _L_lock_885 () from /lib/libpthread.so.0 > #3 0xb73d7d66 in pthread_mutex_lock () from /lib/libpthread.so.0 > #4 0xb74cbcd6 in pthread_mutex_lock () from /lib/libc.so.6 > #5 0x080e7731 in ovs_mutex_lock_at (l_=0x962d7b0, where=0x815b306 > "lib/rconn.c:960") at lib/ovs-thread.c:70 > #6 0x080f4a6f in rconn_get_version (rconn=0x962d7b0) at lib/rconn.c:960 > #7 0x0807ffba in ofconn_get_protocol (ofconn=0x962f490) at > ofproto/connmgr.c:992 > #8 0x08080018 in connmgr_wants_packet_in_on_miss (mgr=0x9600e88) at > ofproto/connmgr.c:1577 > #9 0x0806753a in rule_dpif_lookup (ofproto=0x9601598, flow=0xb28f9ebc, > wc=0xb28e31dc, rule=0xb28e2eb4, take_ref=false) at > ofproto/ofproto-dpif.c:3238 > ---Type <return> to continue, or q <return> to quit--- > #10 0x0807c568 in xlate_actions__ (xin=0xb28f9eb8, xout=0xb28e31dc) at > ofproto/ofproto-dpif-xlate.c:3278 > #11 xlate_actions (xin=0xb28f9eb8, xout=0xb28e31dc) at > ofproto/ofproto-dpif-xlate.c:3182 > #12 0x08076265 in handle_upcalls (arg=0x95f6460) at > ofproto/ofproto-dpif-upcall.c:931 > #13 udpif_upcall_handler (arg=0x95f6460) at > ofproto/ofproto-dpif-upcall.c:531 > #14 0x080e7058 in ovsthread_wrapper (aux_=0x95dc718) at > lib/ovs-thread.c:322 > #15 0xb73d5912 in start_thread () from /lib/libpthread.so.0 > #16 0xb74bf4ae in clone () from /lib/libc.so.6 > > Thread 2 (Thread 0xb625cb90 (LWP 6925)): > #0 0xb7728424 in __kernel_vsyscall () > #1 0xb74b56c3 in poll () from /lib/libc.so.6 > #2 0x08102166 in time_poll (pollfds=0xb6998de0, n_pollfds=2, handles=0x0, > timeout_when=9223372036854775807, elapsed=0xb625c128) at lib/timeval.c:301 > #3 0x080f3f5a in poll_block () at lib/poll-loop.c:314 > #4 0x080e6c79 in ovsrcu_postpone_thread (arg=0x0) at lib/ovs-rcu.c:267 > #5 0x080e7058 in ovsthread_wrapper (aux_=0x95c2870) at > lib/ovs-thread.c:322 > #6 0xb73d5912 in start_thread () from /lib/libpthread.so.0 > #7 0xb74bf4ae in clone () from /lib/libc.so.6 > > Thread 1 (Thread 0xb725f8f0 (LWP 6615)): > #0 0xb7728424 in __kernel_vsyscall () > #1 0xb7412b10 in raise () from /lib/libc.so.6 > #2 0xb7414421 in abort () from /lib/libc.so.6 > #3 0x081044c4 in ovs_abort_valist (err_no=0, format=0x815d864 "%s: > assertion %s failed in %s()", args=0xbfef7658 > "\005\265\025\b\210\271\025\b\366\267\025\b\260\327b\txv\357\277JJ\017\b\b") > at lib/util.c:322 > #4 0x08109858 in vlog_abort_valist (module_=0x81847e0, message=0x815d864 > "%s: assertion %s failed in %s()", args=0xbfef7658 > "\005\265\025\b\210\271\025\b\366\267\025\b\260\327b\txv\357\277JJ\017\b\b") > at lib/vlog.c:992 > #5 0x08109882 in vlog_abort (module=0x81847e0, message=0x815d864 "%s: > assertion %s failed in %s()") at lib/vlog.c:1006 > #6 0x08104711 in ovs_assert_failure (where=0x815b505 "lib/rconn.c:568", > function=0x19d7 <Address 0x19d7 out of bounds>, condition=0x815b988 > "version >= 0 && version <= 0xff") at lib/util.c:71 > #7 0x080f6003 in run_ACTIVE (rc=0x962d7b0) at lib/rconn.c:568 > #8 rconn_run (rc=0x962d7b0) at lib/rconn.c:659 > #9 0x08081e45 in ofconn_run (mgr=0x9600e88, handle_openflow=0x80658a0 > <handle_openflow>) at ofproto/connmgr.c:1390 > #10 connmgr_run (mgr=0x9600e88, handle_openflow=0x80658a0 > <handle_openflow>) at ofproto/connmgr.c:339 > #11 0x08062597 in ofproto_run (p=0x96015a0) at ofproto/ofproto.c:1543 > #12 0x0804bef3 in bridge_run__ () at vswitchd/bridge.c:2255 > #13 0x080537ca in bridge_run () at vswitchd/bridge.c:2307 > #14 0x08054e56 in main (argc=-1257216160, argv=0x0) at > vswitchd/ovs-vswitchd.c:116 > > Running OVS 1.9.3 and 2.3.0, both crashes at the same assertion. > XenServer 6.2 > > It seems that this only happens when ovs-vswitchd is on heavy load and I > run ovs-ofctl add-flow. > > Anyone knows what could be triggering this ? > > Regards, > > > -- > []'s > Luiz Henrique Ozaki > -- []'s Luiz Henrique Ozaki
_______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss