Hello, 1. How about using Integrated ILO device for fencing? I am using HP Proliant DL360 G7 server which supports ILO3. - Can RILOE Stonith be used for this?
2. Can meatware Stonith plugin be used for production software? 3. One more issue which I am facing is that when I try -"crm ra list stonith" command, there is no output. although different RA's under Heartbeat class are visible. - Also, Stonith class is visible in the output of "crm ra classes" command - all the default Stonith RA's like meatware, suicide, ibmrsa, ipmi etc are present in /usr/lib/stonith/plugins directory. - Due to this I am not able to configure stonith in my system. Thanks and regards Neha Chatrath On Tue, Oct 18, 2011 at 2:51 PM, neha chatrath <nehachatr...@gmail.com>wrote: > Hello, > > > 1. If a resource fails, node should reboot (through fencing mechanism) > > and resources should re-start on the node. > > Why would you want that? This would increase the service downtime > considerable. Why is a local restart not possible ... and even if there > is a good reason for a reboot, why not starting the resource on the > other node? > -In our system, there are some primitive, clone resources along with 3 > different master-slave resources. > -All the masters and slaves of these resources are co-located i.e. all the > 3 masters are co-located on a node and 3 slaves on the other node. > -These 3 master-slaves resources are tightly coupled. There is a > requirement that failure of even any one of these resources, restarts all > the resources in the group > -All these resources can be shifted to the other node but subsequently > these should also be restarted as a lot of data/control plane synching is > being done between the two nodes. > e.g. If one of the resources running on node1 as a Master fails, then all > these 3 resources are shifted to the other node i.e. node2 (with > corresponding slave resources being promoted as master). On node1, these > resources should get re-started as slaves. > > We understand that node restart will increase the downtime but since we > could not find much on the option for group restart of master-slave > resources, we are trying for node restart option. > > > Thanks and regards > Neha Chatrath > > ---------- Forwarded message ---------- > From: Andreas Kurz <andr...@hastexo.com> > Date: Tue, Oct 18, 2011 at 1:55 PM > Subject: Re: [Pacemaker] Problem in Stonith configuration > To: pacemaker@oss.clusterlabs.org > > > Hello, > > > On 10/18/2011 09:00 AM, neha chatrath wrote: > > Hello, > > > > Minor updates in the first requirement. > > 1. If a resource fails, node should reboot (through fencing mechanism) > > and resources should re-start on the node. > > Why would you want that? This would increase the service downtime > considerable. Why is a local restart not possible ... and even if there > is a good reason for a reboot, why not starting the resource on the > other node? > > > > 2. If the physical link between the nodes in a cluster fails then that > > node should be isolated (kind of a power down) and the resources should > > continue to run on the other nodes > > That is how stonith works, yes. > > crm ra list stonith ... gives you a list of all available stonith plugins. > > crm ra info stonit:xxxx ... details for a specific plugin. > > Using external/ipmi is often a good choice because a lot of servers > already have an BMC with IPMI on board or they are shipped with a > management card supporting IMPI. > > Regards, > Andreas > > > On Tue, Oct 18, 2011 at 12:30 PM, neha chatrath <nehachatr...@gmail.com>wrote: > >> Hello, >> >> Minor updates in the first requirement. >> 1. If a resource fails, node should reboot (through fencing mechanism) and >> resources should re-start on the node. >> >> 2. If the physical link between the nodes in a cluster fails then that >> node should be isolated (kind of a power down) and the resources should >> continue to run on the other nodes >> >> Apologies for the inconvenience. >> >> >> Thanks and regards >> Neha Chatrath >> >> On Tue, Oct 18, 2011 at 12:08 PM, neha chatrath >> <nehachatr...@gmail.com>wrote: >> >>> Hello Andreas, >>> >>> Thanks for the reply. >>> >>> So can you please suggest what Stonith plugin should I use for the >>> production release of my software. I have the following system requirements: >>> 1. If a node in the cluster fails, it should be reboot and resources >>> should re-start on the node. >>> 2. If the physical link between the nodes in a cluster fails then that >>> node should be isolated (kind of a power down) and the resources should >>> continue to run on the other nodes. >>> >>> I have different types of resources e.g. primitive, master-slave and cone >>> running on my system. >>> >>> Thanks and regards >>> Neha Chatrath >>> >>> >>> Date: Mon, 17 Oct 2011 15:08:16 +0200 >>> From: Andreas Kurz <andr...@hastexo.com> >>> To: pacemaker@oss.clusterlabs.org >>> Subject: Re: [Pacemaker] Problem in Stonith configuration >>> Message-ID: <4e9c28c0.8070...@hastexo.com> >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> Hello, >>> >>> >>> On 10/17/2011 12:34 PM, neha chatrath wrote: >>> > Hello, >>> > I am configuring a 2 node cluster with following configuration: >>> > >>> > *[root@MCG1 init.d]# crm configure show >>> > >>> > node $id="16738ea4-adae-483f-9d79- >>> b0ecce8050f4" mcg2 \ >>> > attributes standby="off" >>> > >>> > node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \ >>> > attributes standby="off" >>> > >>> > primitive ClusterIP ocf:heartbeat:IPaddr \ >>> > params ip="192.168.1.204" cidr_netmask="255.255.255.0" nic="eth0:1" \ >>> > >>> > op monitor interval="40s" timeout="20s" \ >>> > meta target-role="Started" >>> > >>> > primitive app1_fencing stonith:suicide \ >>> > op monitor interval="90" \ >>> > meta target-role="Started" >>> > >>> > primitive myapp1 ocf:heartbeat:Redundancy \ >>> > op monitor interval="60s" role="Master" timeout="30s" on-fail="standby" >>> \ >>> > op monitor interval="40s" role="Slave" timeout="40s" on-fail="restart" >>> > >>> > primitive myapp2 ocf:mcg:Redundancy_myapp2 \ >>> > op monitor interval="60" role="Master" timeout="30" on-fail="standby" \ >>> > op monitor interval="40" role="Slave" timeout="40" on-fail="restart" >>> > >>> > primitive myapp3 ocf:mcg:red_app3 \ >>> > op monitor interval="60" role="Master" timeout="30" on-fail="fence" \ >>> > op monitor interval="40" role="Slave" timeout="40" on-fail="restart" >>> > >>> > ms ms_myapp1 myapp1 \ >>> > meta master-max="1" master-node-max="1" clone-max="2" >>> clone-node-max="1" >>> > notify="true" >>> > >>> > ms ms_myapp2 myapp2 \ >>> > meta master-max="1" master-node-max="1" clone-max="2" >>> clone-node-max="1" >>> > notify="true" >>> > >>> > ms ms_myapp3 myapp3 \ >>> > meta master-max="1" master-max-node="1" clone-max="2" >>> clone-node-max="1" >>> > notify="true" >>> > >>> > colocation myapp1_col inf: ClusterIP ms_myapp1:Master >>> > >>> > colocation myapp2_col inf: ClusterIP ms_myapp2:Master >>> > >>> > colocation myapp3_col inf: ClusterIP ms_myapp3:Master >>> > >>> > order myapp1_order inf: ms_myapp1:promote ClusterIP:start >>> > >>> > order myapp2_order inf: ms_myapp2:promote ms_myapp1:start >>> > >>> > order myapp3_order inf: ms_myapp3:promote ms_myapp2:start >>> > >>> > property $id="cib-bootstrap-options" \ >>> > dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \ >>> > cluster-infrastructure="Heartbeat" \ >>> > stonith-enabled="true" \ >>> > no-quorum-policy="ignore" >>> > >>> > rsc_defaults $id="rsc-options" \ >>> > resource-stickiness="100" \ >>> > migration-threshold="3" >>> > * >>> >>> > I start Heartbeat demon only one of the nodes e.g. mcg1. But none of >>> the >>> > resources (myapp, myapp1 etc) gets started even on this node. >>> > Following is the output of "*crm_mon -f *" command: >>> > >>> > *Last updated: Mon Oct 17 10:19:22 2011 >>> >>> > Stack: Heartbeat >>> > Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with >>> > quorum >>> > Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1 >>> > 2 Nodes configured, unknown expected votes >>> > 5 Resources configured. >>> > ============ >>> > Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline) >>> >>> The cluster is waiting for a successful fencing event before starting >>> all resources .. the only way to be sure the second node runs no >>> resources. >>> >>> Since you are using suicide pluging this will never happen if Heartbeat >>> is not started on that node. If this is only a _test_setup_ go with ssh >>> or even null stonith plugin ... never use them on production systems! >>> >>> Regards, >>> Andreas >>> >>> >>> On Mon, Oct 17, 2011 at 4:04 PM, neha chatrath >>> <nehachatr...@gmail.com>wrote: >>> >>>> Hello, >>>> I am configuring a 2 node cluster with following configuration: >>>> >>>> *[root@MCG1 init.d]# crm configure show >>>> >>>> node $id="16738ea4-adae-483f-9d79-b0ecce8050f4" mcg2 \ >>>> attributes standby="off" >>>> >>>> node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \ >>>> attributes standby="off" >>>> >>>> primitive ClusterIP ocf:heartbeat:IPaddr \ >>>> params ip="192.168.1.204" cidr_netmask="255.255.255.0" nic="eth0:1" \ >>>> >>>> op monitor interval="40s" timeout="20s" \ >>>> meta target-role="Started" >>>> >>>> primitive app1_fencing stonith:suicide \ >>>> op monitor interval="90" \ >>>> meta target-role="Started" >>>> >>>> primitive myapp1 ocf:heartbeat:Redundancy \ >>>> op monitor interval="60s" role="Master" timeout="30s" on-fail="standby" >>>> \ >>>> op monitor interval="40s" role="Slave" timeout="40s" on-fail="restart" >>>> >>>> primitive myapp2 ocf:mcg:Redundancy_myapp2 \ >>>> op monitor interval="60" role="Master" timeout="30" on-fail="standby" \ >>>> op monitor interval="40" role="Slave" timeout="40" on-fail="restart" >>>> >>>> primitive myapp3 ocf:mcg:red_app3 \ >>>> op monitor interval="60" role="Master" timeout="30" on-fail="fence" \ >>>> op monitor interval="40" role="Slave" timeout="40" on-fail="restart" >>>> >>>> ms ms_myapp1 myapp1 \ >>>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" >>>> notify="true" >>>> >>>> ms ms_myapp2 myapp2 \ >>>> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" >>>> notify="true" >>>> >>>> ms ms_myapp3 myapp3 \ >>>> meta master-max="1" master-max-node="1" clone-max="2" clone-node-max="1" >>>> notify="true" >>>> >>>> colocation myapp1_col inf: ClusterIP ms_myapp1:Master >>>> >>>> colocation myapp2_col inf: ClusterIP ms_myapp2:Master >>>> >>>> colocation myapp3_col inf: ClusterIP ms_myapp3:Master >>>> >>>> order myapp1_order inf: ms_myapp1:promote ClusterIP:start >>>> >>>> order myapp2_order inf: ms_myapp2:promote ms_myapp1:start >>>> >>>> order myapp3_order inf: ms_myapp3:promote ms_myapp2:start >>>> >>>> property $id="cib-bootstrap-options" \ >>>> dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \ >>>> cluster-infrastructure="Heartbeat" \ >>>> stonith-enabled="true" \ >>>> no-quorum-policy="ignore" >>>> >>>> rsc_defaults $id="rsc-options" \ >>>> resource-stickiness="100" \ >>>> migration-threshold="3" >>>> * >>>> I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the >>>> resources (myapp, myapp1 etc) gets started even on this node. >>>> Following is the output of "*crm_mon -f *" command: >>>> >>>> *Last updated: Mon Oct 17 10:19:22 2011 >>>> Stack: Heartbeat >>>> Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with >>>> quorum >>>> Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1 >>>> 2 Nodes configured, unknown expected votes >>>> 5 Resources configured. >>>> ============ >>>> Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline) >>>> Online: [ mcg1 ] >>>> app1_fencing (stonith:suicide):Started mcg1 >>>> >>>> Migration summary: >>>> * Node mcg1: >>>> * >>>> When I set "stonith_enabled" as false, then all my resources comes up. >>>> >>>> Can somebody help me with STONITH configuration? >>>> >>>> Cheers >>>> Neha Chatrath >>>> KEEP SMILING!!!! >>>> >>> >>> >> >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker