Hi,

I did a few more debugging:

2015-01-27T03:45:04.769Z|223223|rconn|DBG|xapi6<->tcp:x.x.x.x:43173:
entering ACTIVE
2015-01-27T03:45:09.084Z|223233|rconn|DBG|xapi6<->tcp:x.x.x.x:43173: idle 5
seconds, sending inactivity probe
2015-01-27T03:45:09.084Z|223234|util|EMER|lib/rconn.c:568: assertion
version >= 0 && version <= 0xff failed in run_ACTIVE()

#7  0x080f6003 in run_ACTIVE (rc=0xb1898b48) at lib/rconn.c:568
568        ovs_assert(version >= 0 && version <= 0xff);
#8  rconn_run (rc=0xb1898b48) at lib/rconn.c:659
659            STATES
(gdb) p *rc->vconn
$2 = {
  class = 0x81709c0,
  state = 2,
  error = 0,
  allowed_versions = 30,
  peer_versions = 0,
  version = 0,
  recv_any_version = false,
  name = 0xb1895908 "tcp:x.x.x.x:43173"
}


Not sure if the version = 0 here is acceptable but seems to be returning -1
by :
intvconn_get_version(const struct vconn *vconn){ return vconn->version ?
vconn->version : -1;}
Should version = 0 really pass on the assertion ? (version >= 0 && version
<= 0xff)

Is this some sort of bug ?

On the ovs-ofctl I get this:
Jan 27 01:43:56 ovs-ofctl: ovs|00001|hmap|DBG|../lib/shash.c:112: 7 nodes
in bucket (16 nodes, 8 buckets)
Jan 27 01:43:56 ovs-ofctl: ovs|00002|hmap|DBG|../lib/shash.c:112: 7 nodes
in bucket (64 nodes, 32 buckets)
Jan 27 01:43:56 ovs-ofctl: ovs|00003|ofctl|DBG|connecting to
tcp:y.y.y.y:zzzz
Jan 27 01:43:56 ovs-ofctl: ovs|00004|poll_loop|DBG|wakeup due to 0-ms
timeout
Jan 27 01:43:56 ovs-ofctl: ovs|00005|poll_loop|DBG|wakeup due to [POLLOUT]
on fd 8 (x.x.x.x:43173<->y.y.y.y:zzzz) at ../lib/stream-fd-unix.c:120
Jan 27 01:43:56 ovs-ofctl: ovs|00006|hmap|DBG|../lib/ofp-msgs.c:1082: 6
nodes in bucket (128 nodes, 64 buckets)
Jan 27 01:43:56 ovs-ofctl: ovs|00007|hmap|DBG|../lib/ofp-msgs.c:1082: 6
nodes in bucket (256 nodes, 128 buckets)
Jan 27 01:43:56 ovs-ofctl: ovs|00008|hmap|DBG|../lib/ofp-msgs.c:1082: 7
nodes in bucket (512 nodes, 256 buckets)
Jan 27 01:43:56 ovs-ofctl: ovs|00009|hmap|DBG|../lib/ofp-msgs.c:1082: 8
nodes in bucket (512 nodes, 256 buckets)
Jan 27 01:43:56 ovs-ofctl: ovs|00010|hmap|DBG|../lib/ofp-msgs.c:1082: 6
nodes in bucket (512 nodes, 256 buckets)
Jan 27 01:43:56 ovs-ofctl: ovs|00011|hmap|DBG|../lib/ofp-msgs.c:1082: 7
nodes in bucket (512 nodes, 256 buckets)
Jan 27 01:43:56 ovs-ofctl: ovs|00012|hmap|DBG|../lib/ofp-msgs.c:1082: 7
nodes in bucket (1024 nodes, 512 buckets)
Jan 27 01:43:56 ovs-ofctl: ovs|00013|hmap|DBG|../lib/ofp-msgs.c:1082: 6
nodes in bucket (1024 nodes, 512 buckets)
Jan 27 01:43:56 ovs-ofctl: ovs|00014|vconn|DBG|tcp:y.y.y.y:zzzz: sent
(Success): OFPT_HELLO (xid=0x1):
Jan 27 01:43:56 ovs-ofctl:  version bitmap: 0x01
Jan 27 01:43:56 ovs-ofctl: ovs|00015|poll_loop|DBG|wakeup due to [POLLIN]
on fd 8 (x.x.x.x:43173<->y.y.y.y:zzzz) at ../lib/stream-fd-unix.c:124
Jan 27 01:43:56 ovs-ofctl: ovs|00016|vconn|DBG|tcp:y.y.y.y:zzzz: received:
OFPT_HELLO (OF1.3) (xid=0x17daa):
Jan 27 01:43:56 ovs-ofctl:  version bitmap: 0x01, 0x02, 0x03, 0x04
Jan 27 01:43:56 ovs-ofctl: ovs|00017|vconn|DBG|tcp:y.y.y.y:zzzz: negotiated
OpenFlow version 0x01 (we support version 0x01, peer supports version 0x04
and earlier)
Jan 27 01:43:56 ovs-ofctl: ovs|00018|vconn|DBG|tcp:y.y.y.y:zzzz: sent
(Success): OFPT_FLOW_MOD (xid=0x2): ADD
priority=25000,dl_src=02:e0:52:2a:ef:24,dl_dst=f2:18:14:f8:fb:7a
actions=strip_vlan,output:63
Jan 27 01:43:56 ovs-ofctl: ovs|00019|vconn|DBG|tcp:y.y.y.y:zzzz: sent
(Success): OFPT_BARRIER_REQUEST (xid=0x3):
Jan 27 01:44:01 ovs-ofctl: ovs|00020|poll_loop|DBG|wakeup due to [POLLIN]
on fd 8 (x.x.x.x:43173<->y.y.y.y:zzzz) at ../lib/stream-fd-unix.c:124 (0%
CPU usage)
ovs-ofctl: talking to tcp:y.y.y.y:zzzz (End of file)

Thanks,


On Thu, Jan 22, 2015 at 11:47 AM, Luiz Henrique Ozaki <luiz.oz...@gmail.com>
wrote:

> Hi all,
>
> I have some instances running on OVS with the XenServer and the
> ovs-vswitchd process have been crashing for me:
> ovs-vswitchd: monitoring pid 19196 (10 crashes: pid 11819 died, killed
> (Aborted))
>
> I've got this on the log:
> 23159|util|EMER|lib/rconn.c:568: assertion version >= 0 && version <= 0xff
> failed in run_ACTIVE()
>
> I did a backtrace from the coredump, but I couldn't figure it out why I'm
> getting this assertion failed.
>
> Here is a backtrace from the 2.3.0 version:
> (gdb) thread apply all bt
>
> Thread 11 (Thread 0xb7725b90 (LWP 20243)):
> #0  0xb7728424 in __kernel_vsyscall ()
> #1  0xb73d9ff2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #2  0xb74cbbb4 in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib/libc.so.6
> #3  0xb7571199 in handle_fildes_io () from /lib/librt.so.1
> #4  0xb73d5912 in start_thread () from /lib/libpthread.so.0
> #5  0xb74bf4ae in clone () from /lib/libc.so.6
>
> Thread 10 (Thread 0xb48feb90 (LWP 12169)):
> #0  0xb7728424 in __kernel_vsyscall ()
> #1  0xb74b56c3 in poll () from /lib/libc.so.6
> #2  0x08102166 in time_poll (pollfds=0xb6704538, n_pollfds=3, handles=0x0,
> timeout_when=9223372036854775807, elapsed=0xb48e6eb8) at lib/timeval.c:301
> #3  0x080f3f5a in poll_block () at lib/poll-loop.c:314
> #4  0x08075aa1 in udpif_upcall_handler (arg=0x95f6490) at
> ofproto/ofproto-dpif-upcall.c:529
> #5  0x080e7058 in ovsthread_wrapper (aux_=0x95a65e0) at
> lib/ovs-thread.c:322
> #6  0xb73d5912 in start_thread () from /lib/libpthread.so.0
> #7  0xb74bf4ae in clone () from /lib/libc.so.6
>
> Thread 9 (Thread 0xb725eb90 (LWP 12168)):
> #0  0xb7728424 in __kernel_vsyscall ()
> #1  0xb74b56c3 in poll () from /lib/libc.so.6
> #2  0x08102166 in time_poll (pollfds=0xb673b0a8, n_pollfds=3, handles=0x0,
> timeout_when=9223372036854775807, elapsed=0xb7246eb8) at lib/timeval.c:301
> #3  0x080f3f5a in poll_block () at lib/poll-loop.c:314
> #4  0x08075aa1 in udpif_upcall_handler (arg=0x95f6484) at
> ofproto/ofproto-dpif-upcall.c:529
> #5  0x080e7058 in ovsthread_wrapper (aux_=0x9600d20) at
> lib/ovs-thread.c:322
> #6  0xb73d5912 in start_thread () from /lib/libpthread.so.0
> #7  0xb74bf4ae in clone () from /lib/libc.so.6
>
> Thread 8 (Thread 0xb5a5bb90 (LWP 12167)):
> #0  0xb7728424 in __kernel_vsyscall ()
> #1  0xb74b56c3 in poll () from /lib/libc.so.6
> #2  0x08102166 in time_poll (pollfds=0x9612c60, n_pollfds=3, handles=0x0,
> timeout_when=9223372036854775807, elapsed=0xb5a43eb8) at lib/timeval.c:301
> #3  0x080f3f5a in poll_block () at lib/poll-loop.c:314
> #4  0x08075aa1 in udpif_upcall_handler (arg=0x95f6478) at
> ofproto/ofproto-dpif-upcall.c:529
> #5  0x080e7058 in ovsthread_wrapper (aux_=0x9611220) at
> lib/ovs-thread.c:322
> #6  0xb73d5912 in start_thread () from /lib/libpthread.so.0
> #7  0xb74bf4ae in clone () from /lib/libc.so.6
>
> Thread 7 (Thread 0xb38fcb90 (LWP 12174)):
> #0  0xb7728424 in __kernel_vsyscall ()
> #1  0xb74b56c3 in poll () from /lib/libc.so.6
> #2  0x08102166 in time_poll (pollfds=0xb65281c0, n_pollfds=2, handles=0x0,
> timeout_when=9223372036854775807, elapsed=0xb38fad88) at lib/timeval.c:301
> #3  0x080f3f5a in poll_block () at lib/poll-loop.c:314
> #4  0x080e70cb in ovs_barrier_block (barrier=0x95a7bac) at
> lib/ovs-thread.c:290
> #5  0x08076d6e in udpif_revalidator (arg=0x95f3a18) at
> ofproto/ofproto-dpif-upcall.c:588
> ---Type <return> to continue, or q <return> to quit---
> #6  0x080e7058 in ovsthread_wrapper (aux_=0x96110e0) at
> lib/ovs-thread.c:322
> #7  0xb73d5912 in start_thread () from /lib/libpthread.so.0
> #8  0xb74bf4ae in clone () from /lib/libc.so.6
>
> Thread 6 (Thread 0xb30fbb90 (LWP 12178)):
> #0  0xb7728424 in __kernel_vsyscall ()
> #1  0xb74b56c3 in poll () from /lib/libc.so.6
> #2  0x08102166 in time_poll (pollfds=0xb6526308, n_pollfds=2, handles=0x0,
> timeout_when=9223372036854775807, elapsed=0xb30f9d88) at lib/timeval.c:301
> #3  0x080f3f5a in poll_block () at lib/poll-loop.c:314
> #4  0x080e70cb in ovs_barrier_block (barrier=0x95a7bac) at
> lib/ovs-thread.c:290
> #5  0x08076d6e in udpif_revalidator (arg=0x95f3a28) at
> ofproto/ofproto-dpif-upcall.c:588
> #6  0x080e7058 in ovsthread_wrapper (aux_=0x9620820) at
> lib/ovs-thread.c:322
> #7  0xb73d5912 in start_thread () from /lib/libpthread.so.0
> #8  0xb74bf4ae in clone () from /lib/libc.so.6
>
> Thread 5 (Thread 0xb40fdb90 (LWP 12173)):
> #0  0xb7728424 in __kernel_vsyscall ()
> #1  0xb74b56c3 in poll () from /lib/libc.so.6
> #2  0x08102166 in time_poll (pollfds=0xb67119b0, n_pollfds=3, handles=0x0,
> timeout_when=10068193, elapsed=0xb40fbdb8) at lib/timeval.c:301
> #3  0x080f3f5a in poll_block () at lib/poll-loop.c:314
> #4  0x080773d8 in udpif_revalidator (arg=0x95f3a08) at
> ofproto/ofproto-dpif-upcall.c:629
> #5  0x080e7058 in ovsthread_wrapper (aux_=0x9620038) at
> lib/ovs-thread.c:322
> #6  0xb73d5912 in start_thread () from /lib/libpthread.so.0
> #7  0xb74bf4ae in clone () from /lib/libc.so.6
>
> Thread 4 (Thread 0xb50ffb90 (LWP 12166)):
> #0  0xb7728424 in __kernel_vsyscall ()
> #1  0xb74b56c3 in poll () from /lib/libc.so.6
> #2  0x08102166 in time_poll (pollfds=0xb6713a20, n_pollfds=3, handles=0x0,
> timeout_when=9223372036854775807, elapsed=0xb50e7eb8) at lib/timeval.c:301
> #3  0x080f3f5a in poll_block () at lib/poll-loop.c:314
> #4  0x08075aa1 in udpif_upcall_handler (arg=0x95f646c) at
> ofproto/ofproto-dpif-upcall.c:529
> #5  0x080e7058 in ovsthread_wrapper (aux_=0x96110e0) at
> lib/ovs-thread.c:322
> #6  0xb73d5912 in start_thread () from /lib/libpthread.so.0
> #7  0xb74bf4ae in clone () from /lib/libc.so.6
>
> Thread 3 (Thread 0xb28fab90 (LWP 12165)):
> #0  0xb7728424 in __kernel_vsyscall ()
> #1  0xb73dc839 in __lll_lock_wait () from /lib/libpthread.so.0
> #2  0xb73d7e9f in _L_lock_885 () from /lib/libpthread.so.0
> #3  0xb73d7d66 in pthread_mutex_lock () from /lib/libpthread.so.0
> #4  0xb74cbcd6 in pthread_mutex_lock () from /lib/libc.so.6
> #5  0x080e7731 in ovs_mutex_lock_at (l_=0x962d7b0, where=0x815b306
> "lib/rconn.c:960") at lib/ovs-thread.c:70
> #6  0x080f4a6f in rconn_get_version (rconn=0x962d7b0) at lib/rconn.c:960
> #7  0x0807ffba in ofconn_get_protocol (ofconn=0x962f490) at
> ofproto/connmgr.c:992
> #8  0x08080018 in connmgr_wants_packet_in_on_miss (mgr=0x9600e88) at
> ofproto/connmgr.c:1577
> #9  0x0806753a in rule_dpif_lookup (ofproto=0x9601598, flow=0xb28f9ebc,
> wc=0xb28e31dc, rule=0xb28e2eb4, take_ref=false) at
> ofproto/ofproto-dpif.c:3238
> ---Type <return> to continue, or q <return> to quit---
> #10 0x0807c568 in xlate_actions__ (xin=0xb28f9eb8, xout=0xb28e31dc) at
> ofproto/ofproto-dpif-xlate.c:3278
> #11 xlate_actions (xin=0xb28f9eb8, xout=0xb28e31dc) at
> ofproto/ofproto-dpif-xlate.c:3182
> #12 0x08076265 in handle_upcalls (arg=0x95f6460) at
> ofproto/ofproto-dpif-upcall.c:931
> #13 udpif_upcall_handler (arg=0x95f6460) at
> ofproto/ofproto-dpif-upcall.c:531
> #14 0x080e7058 in ovsthread_wrapper (aux_=0x95dc718) at
> lib/ovs-thread.c:322
> #15 0xb73d5912 in start_thread () from /lib/libpthread.so.0
> #16 0xb74bf4ae in clone () from /lib/libc.so.6
>
> Thread 2 (Thread 0xb625cb90 (LWP 6925)):
> #0  0xb7728424 in __kernel_vsyscall ()
> #1  0xb74b56c3 in poll () from /lib/libc.so.6
> #2  0x08102166 in time_poll (pollfds=0xb6998de0, n_pollfds=2, handles=0x0,
> timeout_when=9223372036854775807, elapsed=0xb625c128) at lib/timeval.c:301
> #3  0x080f3f5a in poll_block () at lib/poll-loop.c:314
> #4  0x080e6c79 in ovsrcu_postpone_thread (arg=0x0) at lib/ovs-rcu.c:267
> #5  0x080e7058 in ovsthread_wrapper (aux_=0x95c2870) at
> lib/ovs-thread.c:322
> #6  0xb73d5912 in start_thread () from /lib/libpthread.so.0
> #7  0xb74bf4ae in clone () from /lib/libc.so.6
>
> Thread 1 (Thread 0xb725f8f0 (LWP 6615)):
> #0  0xb7728424 in __kernel_vsyscall ()
> #1  0xb7412b10 in raise () from /lib/libc.so.6
> #2  0xb7414421 in abort () from /lib/libc.so.6
> #3  0x081044c4 in ovs_abort_valist (err_no=0, format=0x815d864 "%s:
> assertion %s failed in %s()", args=0xbfef7658
> "\005\265\025\b\210\271\025\b\366\267\025\b\260\327b\txv\357\277JJ\017\b\b")
> at lib/util.c:322
> #4  0x08109858 in vlog_abort_valist (module_=0x81847e0, message=0x815d864
> "%s: assertion %s failed in %s()", args=0xbfef7658
> "\005\265\025\b\210\271\025\b\366\267\025\b\260\327b\txv\357\277JJ\017\b\b")
>     at lib/vlog.c:992
> #5  0x08109882 in vlog_abort (module=0x81847e0, message=0x815d864 "%s:
> assertion %s failed in %s()") at lib/vlog.c:1006
> #6  0x08104711 in ovs_assert_failure (where=0x815b505 "lib/rconn.c:568",
> function=0x19d7 <Address 0x19d7 out of bounds>, condition=0x815b988
> "version >= 0 && version <= 0xff") at lib/util.c:71
> #7  0x080f6003 in run_ACTIVE (rc=0x962d7b0) at lib/rconn.c:568
> #8  rconn_run (rc=0x962d7b0) at lib/rconn.c:659
> #9  0x08081e45 in ofconn_run (mgr=0x9600e88, handle_openflow=0x80658a0
> <handle_openflow>) at ofproto/connmgr.c:1390
> #10 connmgr_run (mgr=0x9600e88, handle_openflow=0x80658a0
> <handle_openflow>) at ofproto/connmgr.c:339
> #11 0x08062597 in ofproto_run (p=0x96015a0) at ofproto/ofproto.c:1543
> #12 0x0804bef3 in bridge_run__ () at vswitchd/bridge.c:2255
> #13 0x080537ca in bridge_run () at vswitchd/bridge.c:2307
> #14 0x08054e56 in main (argc=-1257216160, argv=0x0) at
> vswitchd/ovs-vswitchd.c:116
>
> Running OVS 1.9.3 and 2.3.0, both crashes at the same assertion.
> XenServer 6.2
>
> It seems that this only happens when ovs-vswitchd is on heavy load and I
> run ovs-ofctl add-flow.
>
> Anyone knows what could be triggering this ?
>
> Regards,
>
>
> --
> []'s
> Luiz Henrique Ozaki
>



-- 
[]'s
Luiz Henrique Ozaki
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to