On Tue, Aug 30, 2016 at 4:17 AM, Numan Siddique <nusid...@redhat.com> wrote:
> > > On Tue, Aug 30, 2016 at 1:11 AM, Andy Zhou <az...@ovn.org> wrote: > >> >> >> On Mon, Aug 29, 2016 at 3:14 AM, Numan Siddique <nusid...@redhat.com> >> wrote: >> >>> >>> >>> On Sat, Aug 27, 2016 at 4:45 AM, Andy Zhou <az...@ovn.org> wrote: >>> >>>> Added the '--no-sync' option base on feedbacks of current >>>> implementation. >>>> >>>> Added appctl command "ovsdb-server/sync-status" based on feedbacks >>>> of current implementation. >>>> >>>> Added a test to simulate the integration of HA manager with OVSDB >>>> server using replication. >>>> >>>> Other documentation and API improvements. >>>> >>>> Signed-off-by: Andy Zhou <az...@ovn.org> >>>> ------ >>>> >>>> I hope to get some review comments on the command line and appctl >>>> interfaces for replication. Since 2.6 is the first release of those >>>> interfaces, it is easier to making changes, compare to future >>>> releases. >>>> >>>> ---- >>>> v1->v2: Fix creashes reported at: >>>> http://openvswitch.org/pipermail/dev/2016-August/078591.html >>>> --- >>>> >>> >>> I haven't tested these patches yet. This patch seems to have a white >>> space warning when applied. >>> >> Thanks for the reported. I will fold the fix in the next version when >> posting. >> >> In case it helps, you can also access the patches from my private repo at: >> https://github.com/azhou-nicira/ovs-review/tree/ovsdb-replic >> ation-sm-v2 >> >> > > Hi Andy, > > I am seeing the below crash when > > - The ovsdb-server changes from > master to standby and the active-ovsdb-server it is about to connect to > is killed just before that or it is not reachable. > > - > The pacemaker OCF script calls the sync-status cmd soon after that. > > > Please let me know if you need more information. > > > Core was generated by `ovsdb-server -vdbg > --log-file=/opt/stack/logs/ovsdb-server-sb.log > --remote=puni'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x000000000041241d in replication_status () at ovsdb/replication.c:875 > 875 SHASH_FOR_EACH (node, replication_dbs) { > Missing separate debuginfos, use: dnf debuginfo-install > glibc-2.23.1-10.fc24.x86_64 openssl-libs-1.0.2h-3.fc24.x86_64 > (gdb) bt > #0 0x000000000041241d in replication_status () at ovsdb/replication.c:875 > #1 0x0000000000406eda in ovsdb_server_get_sync_status (conn=0x1421fd0, > argc=<optimized out>, argv=<optimized out>, config_=<optimized out>) > at ovsdb/ovsdb-server.c:1480 > #2 0x00000000004324ee in process_command (request=0x1421f30, > conn=0x1421fd0) at lib/unixctl.c:313 > #3 run_connection (conn=0x1421fd0) at lib/unixctl.c:347 > #4 unixctl_server_run (server=server@entry=0x141e140) at > lib/unixctl.c:400 > #5 0x0000000000405bdc in main_loop (is_backup=0x7fff08062256, > exiting=0x7fff08062257, run_process=0x0, remotes=0x7fff080622a0, > unixctl=0x141e140, > all_dbs=0x7fff080622e0, jsonrpc=0x13f6f00) at ovsdb/ovsdb-server.c:182 > #6 main (argc=<optimized out>, argv=<optimized out>) at > ovsdb/ovsdb-server.c:430 > > Numan, thanks for the report. I think I spotted the bug: Currently, when replication state machine is reset, the state update takes place after a round of main loop run. this time lag could lead to the back trace in case the unixctl commands was issued during this time lag. I have a fix that add another state to represent the reset condition. The fix is at: https://github.com/azhou-nicira/ovs-review/tree/ovsdb-replication-sm-v3 Would you please let me know if this version works any better?. Thanks! _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev