On Thursday 13 October 2016 07:26 AM, Andy Zhou wrote:
On Sun, Oct 9, 2016 at 12:02 AM, Babu Shanmugam <bscha...@redhat.com
<mailto:bscha...@redhat.com>> wrote:
On Friday 07 October 2016 05:33 AM, Andy Zhou wrote:
Babu, Thank you for working on this. At a high level, it is
not clear to me the boundary between ocf scripts and the
ovn-ctl script -- i.e. which aspect is managed by which
entity. For example, 1) which scripts are responsible for
starting the ovsdb servers.
ovsdb servers are started by the pacemaker. It uses the OCF script
and the OCF script uses ovn-ctl.
2) Which script should manage the fail-over -- I tried to shut
down a cluster node using the "pcs" command, and fail-over did
not happen.
The OCF script for OVN DB servers is capable of understanding the
promote and demote calls. So, pacemaker will use this script to
run ovsdb server in all the nodes and promote one node as the
master(active server). If the node in which the master instance is
running fails, pacemaker automatically promotes another node as
the master. OCF script is an agent for the pacemaker for the OVN
db resource.
The above behavior depends on the way you are configuring the
resource that uses this OCF script. I am attaching a simple set of
commands to configure the ovsdb server. You can create the
resources after creating the cluster with the following command
crm configure < ovndb.pcmk
Please note, you have to replace the macros VM1_NAME, VM2_NAME,
VM3_NAME and MASTER_IP with the respective values before using
ovndb.pcmk. This script works with a 3 node cluster. I am assuming
the node ids as 101, 102, and 103. Please replace them as well to
work with your cluster.
--
Babu
Unfortunately, CRM is not distributed with pacemaker on centos
anymore. It took me some time to get it installed. I think other may
ran into similar issues, so
it may be worth while do document this, or change the script to use
"pcs" which is part of the distribution.
I agree. Is INSTALL*.md good enough? In openstack, we are managing the
resource through puppet manifests.
I adapted the script with my setup. I have two nodes,
"h1"(10.33.74.77) and "h2"(10.33.75.158), For Master_IP, I used
10.33.75.220.
This is the output of crm configure show:
------
[root@h2 azhou]# crm configure show
node1: h1 \
attributes
node2: h2
primitiveClusterIP IPaddr2 \
paramsip=10.33.75.200cidr_netmask=32\
opstart interval=0stimeout=20s\
opstop interval=0stimeout=20s\
opmonitor interval=30s
primitiveWebSite apache \
paramsconfigfile="/etc/httpd/conf/httpd.conf"statusurl="http://127.0.0.1/server-status"\
opstart interval=0stimeout=40s\
opstop interval=0stimeout=60s\
opmonitor interval=1min\
meta
primitiveovndb ocf:ovn:ovndb-servers \
opstart interval=0stimeout=30s\
opstop interval=0stimeout=20s\
oppromote interval=0stimeout=50s\
opdemote interval=0stimeout=50s\
opmonitor interval=1min\
meta
colocationcolocation-WebSite-ClusterIP-INFINITY inf: WebSiteClusterIP
orderorder-ClusterIP-WebSite-mandatory ClusterIP:start WebSite:start
propertycib-bootstrap-options: \
have-watchdog=false\
dc-version=1.1.13-10.el7_2.4-44eb2dd\
cluster-infrastructure=corosync\
cluster-name=mycluster\
stonith-enabled=false
You seem to have configured ovndb just as a primitive resource and not
as a master slave resource. And there is no colocation resource
configured for the ovndb with ClusterIP. Only with the colocation
resource, ovndb server will be co-located with the ClusterIP resource.
You will have to include the following lines for crm configure. You can
configure the same with pcs as well.
ms ovndb-master ovndb meta notify="true"
colocation colocation-ovndb-master-ClusterIP-INFINITY inf:
ovndb-master:Started ClusterIP:Master
order order-ClusterIP-ovndb-master-mandatory inf: ClusterIP:start
ovndb-master:start
--------
I have also added firewall rules to allow access to TCP port 6642 and
port 6641.
At this stage, crm_mon shows:
Last updated: Wed Oct 12 14:49:07 2016 Last change: Wed Oct
12 13:58:55
2016 by root via crm_attributeon h2
Stack: corosync
Current DC: h2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
2 nodes and 3 resources configured
Online: [ h1 h2 ]
ClusterIP(ocf::heartbeat:IPaddr2):Started h2
WebSite (ocf::heartbeat:apache): Started h2
ovndb (ocf::ovn:ovndb-servers):Started h1
Failed Actions:
* ovndb_start_0 on h2 'unknown error' (1): call=39, status=Timed Out,
exitreason
='none',
last-rc-change='Wed Oct 12 14:43:03 2016', queued=0ms, exec=30003ms
---
Not sure what the error message on h2 is about, Notice ovndb service
is now running on h1, while the cluster IP is on h2.
Looks like, the OCF script is not able to start the ovsdb servers in
'h2' node (we are getting a timed-out status). You can check if the OCF
script is working good by using ocf-tester. You can run the ocf-tester using
ocf-tester -n test-ovndb -o master_ip 10.0.0.1 <path-to-the-ocf-script>
Alternately, you can check if the ovsdb servers are started properly by
running
/usr/share/openvswitch/scripts/ovn-ctl --db-sb-sync-from=10.0.0.1
--db-nb-sync-from=10.0.0.1 start_ovsdb
Also, both server are running as a backup server:
[root@h1 azhou]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctl
ovsdb-server/sync-status
state: backup
connecting: tcp:192.0.2.254:6642 <http://192.0.2.254:6642> // I
specified the IP at /etc/openvswitch/ovnsb-active.conf, But the file
was over-written with 192.0.2.254
[root@h2 ovs]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctl
ovsdb-server/sync-status
state: backup
replicating: tcp:10.33.74.77:6642 <http://10.33.74.77:6642> // The
IP address was retained on h2
database: OVN_Southbound
---
Any suggestions on what I did wrong?
I think this is mostly due to the crm configuration. Once you add the
'ms' and 'colocation' resources, you should be able to overcome this
problem.
I have never tried colocating two resources with the ClusterIP resource.
Just for testing, is it possible to drop the WebServer resource?
Thank you,
Babu
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev