Re: [Pacemaker] how to test network access and fail over accordingly?

Craig Hurley Thu, 07 Oct 2010 12:45:57 -0700

Yesterday, the last few emails between Vadym and I were inadvertently
not posted to this list.  Here are those posts for anyone having
similar issues.


Regards,
Craig.

On 7 October 2010 15:20, Vadym Chepkov <vchep...@gmail.com> wrote:
> no, default is 0 - it is not taken into consideration at all.
> Resource stays in place because allocation on the other host has the same 
> score.
> You can see all computed scores using ptest -sL
>
> You don't need to specify $id= , it's redundant, by the way
>
> Vadym
>
> On Oct 6, 2010, at 9:59 PM, Craig Hurley wrote:
>
>> Thanks again and I see what you mean; I unplugged eth0 from both nodes
>> and g_cluster_services went down on both nodes.  I took your advice
>> onboard and read this section:
>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html#id771622
>>
>> ... and I've configured the location rule so that g_cluster_services
>> runs on the node with most connections:
>>
>> primitive p_ping ocf:pacemaker:ping \
>>        params name="p_ping" host_list="172.20.0.254 172.20.50.1
>> 172.20.50.2" multiplier="1000" \
>>        op monitor interval="20s"
>> clone c_ping p_ping \
>>        meta globally-unique="false"
>> location loc_ping g_cluster_services \
>>        rule $id="loc_ping-rule" p_ping: defined p_ping
>>
>> Now if I unplug eth0 from both nodes, g_cluster_services remains up on
>> one of the nodes, this suits my requirements :)
>>
>> One last item: in my config I have not specified a resource
>> stickiness, and the master role and g_cluster_services move around as
>> expected when a node fails, now when a failed node comes back online,
>> the master role and g_cluster_services stay where they are (until the
>> next forced fail over) -- which is the behaviour I require.  Is there
>> a default stickiness that causes this "correct" behaviour?
>>
>> Regards,
>> Craig.
>>
>>
>> On 7 October 2010 11:54, Vadym Chepkov <vchep...@gmail.com> wrote:
>>> monitor operation is essential for ping RA, otherwise it won't work too
>>>
>>> As for the multiplier - it's all about the score and resource stickiness
>>> with multiplier 200, and resource stickiness set to 500, for example,
>>> when both hosts can ping up to 2 ping nodes they will stay where they are, 
>>> but if one host can ping 3 ping nodes but another just 2 -
>>> this will make resources to relocate to better connected host.
>>>
>>> In a simple example I gave you, if this is the ip of a router for both 
>>> nodes and it will go down, this will cause the resource not to failover, 
>>> but just go down, so if this is not what you want, you would probably ping 
>>> not just the router, but both nodes IPs as well and only if you able to 
>>> ping only yourself you would failover:
>>>
>>> location rg0-connected rg0 \
>>> rule -inf: not_defined pingd or pingd lte 200
>>>
>>> Vadym
>>>
>>> On Oct 6, 2010, at 5:56 PM, Craig Hurley wrote:
>>>
>>>> Thanks Vadym, this worked.  It seems the missing name field was
>>>> causing the problem.
>>>>
>>>> On a related note, why do you have a multiplier of 200?
>>>>
>>>> According to 
>>>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html,
>>>> the multiplier field is "The number by which to multiply the number of
>>>> connected ping nodes by. Useful when there are multiple ping nodes
>>>> configured."
>>>>
>>>> I don't understand why one would want to multiply the number of
>>>> connected nodes when there are multiple ping nodes :/
>>>>
>>>> Regards,
>>>> Craig.
>>>>
>>>> On 7 October 2010 09:37, Vadym Chepkov <vchep...@gmail.com> wrote:
>>>>> This is my config that works fine
>>>>>
>>>>> primitive ping ocf:pacemaker:ping \
>>>>>  params name="pingd" host_list="10.10.10.250" multiplier="200" 
>>>>> timeout="5" \
>>>>>  op monitor interval="10"
>>>>>
>>>>> clone connected ping \
>>>>>        meta globally-unique="false"
>>>>>
>>>>> location rg0-connected rg0 \
>>>>>  rule -inf: not_defined pingd or pingd lte 0
>>>>>
>>>>>
>>>>> On Oct 6, 2010, at 4:21 PM, Craig Hurley wrote:
>>>>>
>>>>>> I tried using ping instead of pingd and I added "number" to the
>>>>>> evaluation, I get the same results :/
>>>>>>
>>>>>> primitive p_ping ocf:pacemaker:ping params host_list=172.20.0.254
>>>>>> clone c_ping p_ping meta globally-unique=false
>>>>>> location loc_ping g_cluster_services rule -inf: not_defined p_ping or
>>>>>> p_ping number:lte 0
>>>>>>
>>>>>> Regards,
>>>>>> Craig.
>>>>>>
>>>>>>
>>>>>> On 6 October 2010 20:43, Jayakrishnan <jayakrishnan...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Guess the change:--
>>>>>>> location loc_pingd g_cluster_services rule -inf: not_defined pingd or 
>>>>>>> pingd
>>>>>>> number:lte 0
>>>>>>>
>>>>>>> should work
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>>
>>>>>>> Jayakrishnan. L
>>>>>>>
>>>>>>> Visit:
>>>>>>> www.foralllinux.blogspot.com
>>>>>>> www.jayakrishnan.bravehost.com
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 6, 2010 at 11:56 AM, Claus Denk <d...@us.es> wrote:
>>>>>>>>
>>>>>>>> I am having a similar problem, so let's wait for the experts, But in 
>>>>>>>> the
>>>>>>>> meanwhile, try changing
>>>>>>>>
>>>>>>>>
>>>>>>>> location loc_pingd g_cluster_services rule -inf: not_defined p_pingd
>>>>>>>> or p_pingd lte 0
>>>>>>>>
>>>>>>>> to
>>>>>>>>
>>>>>>>> location loc_pingd g_cluster_services rule -inf: not_defined pingd
>>>>>>>> or pingd number:lte 0
>>>>>>>>
>>>>>>>> and see what happens. As far as I have read, it is also more 
>>>>>>>> recommended
>>>>>>>> to use the "ping"
>>>>>>>> resource instead of "pingd"...
>>>>>>>>
>>>>>>>> kind regards, Claus
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/06/2010 05:45 AM, Craig Hurley wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I have a 2 node cluster, running DRBD, heartbeat and pacemaker in
>>>>>>>>> active/passive mode.  On both nodes, eth0 is connected to the main
>>>>>>>>> network, eth1 is used to connect the nodes directly to each other.
>>>>>>>>> The nodes share a virtual IP address on eth0.  Pacemaker is also
>>>>>>>>> controlling a custom service with an LSB compliant script in
>>>>>>>>> /etc/init.d/.  All of this is working fine and I'm happy with it.
>>>>>>>>>
>>>>>>>>> I'd like to configure the nodes so that they fail over if eth0 goes
>>>>>>>>> down (or if they cannot access a particular gateway), so I tried
>>>>>>>>> adding the following (as per
>>>>>>>>> http://www.clusterlabs.org/wiki/Example_configurations#Set_up_pingd)
>>>>>>>>>
>>>>>>>>> primitive p_pingd ocf:pacemaker:pingd params host_list=172.20.0.254 op
>>>>>>>>> monitor interval=15s timeout=5s
>>>>>>>>> clone c_pingd p_pingd meta globally-unique=false
>>>>>>>>> location loc_pingd g_cluster_services rule -inf: not_defined p_pingd
>>>>>>>>> or p_pingd lte 0
>>>>>>>>>
>>>>>>>>> ... but when I do add that, all resource are stopped and they don't
>>>>>>>>> come back up on either node.  Am I making a basic mistake or do you
>>>>>>>>> need more info from me?
>>>>>>>>>
>>>>>>>>> All help is appreciated,
>>>>>>>>> Craig.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> pacemaker
>>>>>>>>> Version: 1.0.8+hg15494-2ubuntu2
>>>>>>>>>
>>>>>>>>> heartbeat
>>>>>>>>> Version: 1:3.0.3-1ubuntu1
>>>>>>>>>
>>>>>>>>> drbd8-utils
>>>>>>>>> Version: 2:8.3.7-1ubuntu2.1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> r...@rpalpha:~$ sudo crm configure show
>>>>>>>>> node $id="32482293-7b0f-466e-b405-c64bcfa2747d" rpalpha
>>>>>>>>> node $id="3f2aac12-05aa-4ac7-b91f-c47fa28efb44" rpbravo
>>>>>>>>> primitive p_drbd_data ocf:linbit:drbd \
>>>>>>>>>         params drbd_resource="data" \
>>>>>>>>>         op monitor interval="30s"
>>>>>>>>> primitive p_fs_data ocf:heartbeat:Filesystem \
>>>>>>>>>         params device="/dev/drbd/by-res/data" directory="/mnt/data"
>>>>>>>>> fstype="ext4"
>>>>>>>>> primitive p_ip ocf:heartbeat:IPaddr2 \
>>>>>>>>>         params ip="172.20.50.3" cidr_netmask="255.255.0.0" nic="eth0" 
>>>>>>>>> \
>>>>>>>>>         op monitor interval="30s"
>>>>>>>>> primitive p_rp lsb:rp \
>>>>>>>>>         op monitor interval="30s" \
>>>>>>>>>         meta target-role="Started"
>>>>>>>>> group g_cluster_services p_ip p_fs_data p_rp
>>>>>>>>> ms ms_drbd p_drbd_data \
>>>>>>>>>         meta master-max="1" master-node-max="1" clone-max="2"
>>>>>>>>> clone-node-max="1" notify="true"
>>>>>>>>> location loc_preferred_master g_cluster_services inf: rpalpha
>>>>>>>>> colocation colo_mnt_on_master inf: g_cluster_services ms_drbd:Master
>>>>>>>>> order ord_mount_after_drbd inf: ms_drbd:promote 
>>>>>>>>> g_cluster_services:start
>>>>>>>>> property $id="cib-bootstrap-options" \
>>>>>>>>>         dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
>>>>>>>>>         cluster-infrastructure="Heartbeat" \
>>>>>>>>>         no-quorum-policy="ignore" \
>>>>>>>>>         stonith-enabled="false" \
>>>>>>>>>         expected-quorum-votes="2" \
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> r...@rpalpha:~$ sudo cat /etc/ha.d/ha.cf
>>>>>>>>> node rpalpha
>>>>>>>>> node rpbravo
>>>>>>>>>
>>>>>>>>> keepalive 2
>>>>>>>>> warntime 5
>>>>>>>>> deadtime 15
>>>>>>>>> initdead 60
>>>>>>>>>
>>>>>>>>> mcast eth0 239.0.0.43 694 1 0
>>>>>>>>> bcast eth1
>>>>>>>>>
>>>>>>>>> use_logd yes
>>>>>>>>> autojoin none
>>>>>>>>> crm respawn
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> r...@rpalpha:~$ sudo cat /etc/drbd.conf
>>>>>>>>> global {
>>>>>>>>>         usage-count no;
>>>>>>>>> }
>>>>>>>>> common {
>>>>>>>>>         protocol C;
>>>>>>>>>
>>>>>>>>>         handlers {}
>>>>>>>>>
>>>>>>>>>         startup {}
>>>>>>>>>
>>>>>>>>>         disk {}
>>>>>>>>>
>>>>>>>>>         net {
>>>>>>>>>                 cram-hmac-alg sha1;
>>>>>>>>>                 shared-secret "foobar";
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         syncer {
>>>>>>>>>                 verify-alg sha1;
>>>>>>>>>                 rate 100M;
>>>>>>>>>         }
>>>>>>>>> }
>>>>>>>>> resource data {
>>>>>>>>>         device /dev/drbd0;
>>>>>>>>>         meta-disk internal;
>>>>>>>>>         on rpalpha {
>>>>>>>>>                 disk /dev/mapper/rpalpha-data;
>>>>>>>>>                 address 192.168.1.1:7789;
>>>>>>>>>         }
>>>>>>>>>         on rpbravo {
>>>>>>>>>                 disk /dev/mapper/rpbravo-data;
>>>>>>>>>                 address 192.168.1.2:7789;
>>>>>>>>>         }
>>>>>>>>> }
>>>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: 
>>>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>>>
>>>>>
>>>
>>>
>
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] how to test network access and fail over accordingly?

Reply via email to