Re: [DRBD-user] HA DRBD setup - graceful failover/active node detection

Jake Smith Thu, 05 Jan 2012 07:05:11 -0800

To add to what Pascal said:

To find out where a specific resource is running you can also run something 
like:
crm resource status name_of_resource


Which will yield something like this for a primitive (Condor is a node name in 
this case):
resource p_ip_dns is running on: Condor
Or for a DRBD master/slave resource:
resource ms_drbd_samba is running on: Condor Master
resource ms_drbd_samba is running on: Vulture

There is also a crm shell in addition to the command line. The shell supports 
tab command completion and has a very robust help system for every command 
which is very helpful. To access it just type crm without any arguments.
Then you can type help to get general help
help <command> to get command specific help
<tab> <tab> will list commands available
quit or exit to leave the shell
you can descend into each level by typing that command
Once in a level up will bring you back and help and <tab> <tab> will show the 
commands available at the current level

As an example to get to the resource section and get help on migrate you would 
type "crm" which would put you in the shell, then "resource" would descend into 
the resource section of the shell, then "help migrate" would show the following 
information about migrating resources:

crm(live)resource# help migrate

Migrate a resource to a different node. If node is left out, the
resource is migrated by creating a constraint which prevents it from
running on the current node. Additionally, you may specify a
lifetime for the constraint---once it expires, the location
constraint will no longer be active.

Usage:
...............
migrate <rsc> [<node>] [<lifetime>] [force]
...............

And the "migrate" above is the command Pascal referred to about moving 
resources between nodes. Make sure you understand the move or migrate commands 
thoroughly before using as they can also prevent resources from failing back 
because "migrate" can create location constraints infinitely preventing the 
resource(s) from running on the node you migrated away from without manual 
intervention.

HTH

Jake
----- Original Message -----

From: "Pascal BERTON" <[email protected]>
To: "Elias Chatzigeorgiou" <[email protected]>, [email protected]
Sent: Thursday, January 5, 2012 6:14:51 AM
Subject: Re: [DRBD-user] HA DRBD setup - graceful failover/active node detection



Hi Elias !

“crm status” will tell you on which node a given resource is active. You can 
also use “crm_mon” (underscore!) which will present the same thing in real time 
(crm status is a one shot run).
Basically, crm is the command to use to do everything you intend to do.
Regarding the iSCSI target daemon, you have declared an IP resource in your 
cluster, the one that your remote iSCSI initiators point to. Since the IP 
resource is a resource, you will see it in the crm status report, and you will 
know which node owns it.

In order to failover your resources, guess what, you may use crm too! J As far 
as I remember, it’s something like “crm resource migrate <res_name> 
<target_host>”. Have a look to the crm man pages for more details.
You may also manually modify the cib config within the cluster to change the 
scores of your resources. This is what I use to do, although I’m not sure it is 
actually a best practice… To make short, the “score” is sort of a “weight” that 
you give to your resource on a given host. The host on which the weight/score 
is the highest is the host on which the resource is tied to. Change the scores, 
the resource moves.

You must read at least 2 docs to better understand that complex stuff :
1) “Pacemaker 1.0 Configuration Explained”, by Andrew Beekhof. There might be a 
more recent release, but I don’t know of it… I had to read it twice, but it 
gives valuable information regarding the way a Pacemaker cluster is structured 
and works. This manual worths gold!
2) And then the “CRM CLI guide” (not sure which version is the latest, I have 
the 0.94) by Dejan Muhamedagic and Yan Gao, to understand all crm is able to 
achieve, and that’s not few!
Also, the “Cluster from scratch” manual is a good introduction. An dit contains 
DRBD examples. May be you might start by it, to catch the first concepts… It is 
easier to read than the “Pacemaker 1.0 Configuration Explained” I mentioned 
above.

You’ll find all this on the web of course!

HTH!

Best regards,

Pascal.


De : [email protected] 
[mailto:[email protected]] De la part de Elias Chatzigeorgiou
Envoyé : jeudi 5 janvier 2012 03:14
À : [email protected]
Objet : [DRBD-user] HA DRBD setup - graceful failover/active node detection




I have a two-node active/passive cluster, with DRBD controlled by 
corosync/pacemaker.

All storage is based on LVM.





------------------------------------------------------------------------------------

a) How do I know, which node of the cluster is currently active?

How can I check if a node is currently in use by the iSCSI-target daemon?



I can try to deactivate a volume group using:



[root@node1 ~]# vgchange -an data

Can't deactivate volume group "data" with 3 open logical volume(s)



In which case, if I get a message like the above then I know that

node1 is the active node, but is there a better (non-intrusive)

way to check?



A better option seems to be 'pvs -v'. If the node is active then it shows the 
volume names:

[root@node1 ~]# pvs -v

Scanning for physical volume names

PV VG Fmt Attr PSize PFree DevSize PV UUID

/dev/drbd1 data lvm2 a- 109.99g 0 110.00g c40m9K-tNk8-vTVz-tKix-UGyu-gYXa-gnKYoJ

/dev/drbd2 tempdb lvm2 a- 58.00g 0 58.00g 4CTq7I-yxAy-TZbY-TFxa-3alW-f97X-UDlGNP

/dev/drbd3 distrib lvm2 a- 99.99g 0 100.00g 
l0DqWG-dR7s-XD2M-3Oek-bAft-d981-UuLReC



where on the inactive node it gives errors:

[root@node2 ~]# pvs -v

Scanning for physical volume names

/dev/drbd0: open failed: Wrong medium type

/dev/drbd1: open failed: Wrong medium type



Any further ideas/comments/suggestions?



------------------------------------------------------------------------------------



b) how can I gracefully failover to the other node ? Up to now, the only way I

know is forcing the active node to reboot (by entering two subsequent 'reboot'

commands). This however breaks the DRBD synchronization, and I need to

use a fix-split-brain procedure to bring back the DRBD in sync.



On the other hand, if I try to stop the corosync service on the active node,

the command takes forever! I understand that the suggested procedure should be

to disconnect all clients from the active node and then stop services,

is it a better approach to shut down the public network interface before

stopping the corosync service (in order to forcibly close client connections)?



Thanks






_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] HA DRBD setup - graceful failover/active node detection

Reply via email to