Re: [ovs-dev] [PATCH v3 1/4] ovn: ovn-ctl support for HA ovn DB servers

Babu Shanmugam Wed, 12 Oct 2016 22:57:39 -0700


On Thursday 13 October 2016 07:26 AM, Andy Zhou wrote:

On Sun, Oct 9, 2016 at 12:02 AM, Babu Shanmugam <[email protected]<mailto:[email protected]>> wrote:




    On Friday 07 October 2016 05:33 AM, Andy Zhou wrote:

        Babu, Thank you for working on this.  At a high level, it is
        not clear to me the boundary between ocf scripts and the
        ovn-ctl script -- i.e. which aspect is managed by which
        entity.  For example, 1) which scripts are responsible for
        starting the ovsdb servers.

    ovsdb servers are started by the pacemaker. It uses the OCF script
    and the OCF script uses ovn-ctl.

        2) Which script should manage the fail-over -- I tried to shut
        down a cluster node using the "pcs" command, and fail-over did
        not happen.

    The OCF script for OVN DB servers is capable of understanding the
    promote and demote calls. So, pacemaker will use this script to
    run ovsdb server in all the nodes and promote one node as the
    master(active server). If the node in which the master instance is
    running fails, pacemaker automatically promotes another node as
    the master. OCF script is an agent for the pacemaker for the OVN
    db resource.
    The above behavior depends on the way you are configuring the
    resource that uses this OCF script. I am attaching a simple set of
    commands to configure the ovsdb server. You can create the
    resources after creating the cluster with the following command

    crm configure < ovndb.pcmk

    Please note, you have to replace the macros VM1_NAME, VM2_NAME,
    VM3_NAME and MASTER_IP with the respective values before using
    ovndb.pcmk. This script works with a 3 node cluster. I am assuming
    the node ids as 101, 102, and 103. Please replace them as well to
    work with your cluster.


    --
    Babu

Unfortunately, CRM is not distributed with pacemaker on centosanymore. It took me some time to get it installed. I think other mayran into similar issues, soit may be worth while do document this, or change the script to use"pcs" which is part of the distribution.

I agree. Is INSTALL*.md good enough? In openstack, we are managing theresource through puppet manifests.

I adapted the script with my setup. I have two nodes,"h1"(10.33.74.77) and "h2"(10.33.75.158), For Master_IP, I used10.33.75.220.


This is the output of crm configure show:

------

 [root@h2 azhou]# crm configure show

node1: h1 \

attributes

node2: h2

primitiveClusterIP IPaddr2 \

paramsip=10.33.75.200cidr_netmask=32\

opstart interval=0stimeout=20s\

opstop interval=0stimeout=20s\

opmonitor interval=30s

primitiveWebSite apache \

paramsconfigfile="/etc/httpd/conf/httpd.conf"statusurl="http://127.0.0.1/server-status"\

opstart interval=0stimeout=40s\

opstop interval=0stimeout=60s\

opmonitor interval=1min\

meta

primitiveovndb ocf:ovn:ovndb-servers \

opstart interval=0stimeout=30s\

opstop interval=0stimeout=20s\

oppromote interval=0stimeout=50s\

opdemote interval=0stimeout=50s\

opmonitor interval=1min\

meta

colocationcolocation-WebSite-ClusterIP-INFINITY inf: WebSiteClusterIP

orderorder-ClusterIP-WebSite-mandatory ClusterIP:start WebSite:start

propertycib-bootstrap-options: \

have-watchdog=false\

dc-version=1.1.13-10.el7_2.4-44eb2dd\

cluster-infrastructure=corosync\

cluster-name=mycluster\

stonith-enabled=false

You seem to have configured ovndb just as a primitive resource and notas a master slave resource. And there is no colocation resourceconfigured for the ovndb with ClusterIP. Only with the colocationresource, ovndb server will be co-located with the ClusterIP resource.You will have to include the following lines for crm configure. You canconfigure the same with pcs as well.


ms ovndb-master ovndb meta notify="true"

colocation colocation-ovndb-master-ClusterIP-INFINITY inf:ovndb-master:Started ClusterIP:Masterorder order-ClusterIP-ovndb-master-mandatory inf: ClusterIP:startovndb-master:start

--------
I have also added firewall rules to allow access to TCP port 6642 andport 6641.
At this stage, crm_mon shows:
Last updated: Wed Oct 12 14:49:07 2016 Last change: Wed Oct12 13:58:55
 2016 by root via crm_attributeon h2

Stack: corosync

Current DC: h2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum

2 nodes and 3 resources configured


Online: [ h1 h2 ]


ClusterIP(ocf::heartbeat:IPaddr2):Started h2

WebSite (ocf::heartbeat:apache):        Started h2

ovndb (ocf::ovn:ovndb-servers):Started h1


Failed Actions:
* ovndb_start_0 on h2 'unknown error' (1): call=39, status=Timed Out,exitreason
='none',

last-rc-change='Wed Oct 12 14:43:03 2016', queued=0ms, exec=30003ms


---
Not sure what the error message on h2 is about, Notice ovndb serviceis now running on h1, while the cluster IP is on h2.

Looks like, the OCF script is not able to start the ovsdb servers in'h2' node (we are getting a timed-out status). You can check if the OCFscript is working good by using ocf-tester. You can run the ocf-tester using


ocf-tester -n test-ovndb -o master_ip 10.0.0.1 <path-to-the-ocf-script>

Alternately, you can check if the ovsdb servers are started properly byrunning

/usr/share/openvswitch/scripts/ovn-ctl --db-sb-sync-from=10.0.0.1--db-nb-sync-from=10.0.0.1 start_ovsdb

Also, both server are running as a backup server:
[root@h1 azhou]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctlovsdb-server/sync-status
state: backup
connecting: tcp:192.0.2.254:6642 <http://192.0.2.254:6642> // Ispecified the IP at /etc/openvswitch/ovnsb-active.conf, But the filewas over-written with 192.0.2.254
[root@h2 ovs]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctlovsdb-server/sync-status
state: backup
replicating: tcp:10.33.74.77:6642 <http://10.33.74.77:6642> // TheIP address was retained on h2
database: OVN_Southbound

---

Any suggestions on what I did wrong?

I think this is mostly due to the crm configuration. Once you add the'ms' and 'colocation' resources, you should be able to overcome thisproblem.

I have never tried colocating two resources with the ClusterIP resource.Just for testing, is it possible to drop the WebServer resource?


Thank you,
Babu

_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3 1/4] ovn: ovn-ctl support for HA ovn DB servers

Reply via email to