Re: [Pacemaker] stonith q

Digimer Tue, 04 Nov 2014 13:59:19 -0800

On 04/11/14 02:45 PM, Alex Samad - Yieldbroker wrote:

{snip}

Any pointers to a frame work somewhere ?


I do not think there is any formal stonith agent developers guide;
take at any existing agent like external/ipmi and modify to suite your

needs.

Does fenced have any handlers, I notice it logs a message in syslog and

cluster log is there a chance to capture the event there ?


I do not have experience with RH CMAN, sorry. But from what I
understand fenced and stonithd agents are compatible.


https://fedorahosted.org/cluster/wiki/FenceAgentAPI



Thanks


Note the return codes. Also, not listed there, is the requirement that an
agent print it's XML validation data. You can see example of what this looks
like by calling 'fence_ipmilan -o metadata' (or any other
fence_* agent).

For the record, I think this is a bad idea.


So lots of people have said this is bad idea and maybe I am miss understanding 
something.

 From my observation of my 2 node cluster, when inter cluster comms has an 
issues 1 node kills the other node.
Lets say A + B.
A is currently running the resources, B get elected to die.

Nothing is "selected". Both nodes will initiate a fence, but if you set'delay="15"' for the node "A" fence method, the node B will pause for 15seconds before acting on the fence request. If node A saw no delay onnode B, it will immediately proceed with the fence action. In this way,node A will always be faster than node B, so node B will always lose ina fence race like this.

A signal is sent cman -> PK -> stonithd


Correct (basically).

 From the logs on server B I see fenced trying to kill server B, but I don't 
use any cman/stonith agents. I would like to capture that event and use a OS 
reboot.

Then use a fabric fence method. These are ones where the networkconnection(s) to the target node is(are) severed. Thus, node B will sitthere perpetually trying to fence node A, but failing because it can'ttalk to it's fence device (network switch, etc). Then a human can comein, examine the system, reboot the node and unfence the node once it hasrebooted, restoring network connections.


I created a proof of concept fence agent doing this with D-Link switches:

https://github.com/digimer/fence_dlink_snmp

It should be easy enough to adapt to, say, call the hypervisor/host andusing brctl to detach the virtual interfaces to the VM.


Or, more easily, stick with power fencing and use an external log server.

So the problem I perceive is if server B is in a state where it can't run OS 
locked up or crashed. I believe VMware will look after that, from experience I 
have seen it deal with that

I'm not sure I understand... I don't use VMWare, so maybe I am missingsomething. If the node stops all processing, then it's possible the nodewill be detected as faulty and will be rebooted. However, there are manyways that nodes can fail. Secondly, unless something tells pacemakerthat the node is dead, it won't know and is not allowed to assume.

The issue is  if  B is running enough to still have a VIP (one of the resources 
that PK looks after) is still on B and A and B can't or will not shutdown via 
the OS. I understand that, but I would like still attempt to reboot at that time

You're mistake here is assuming that the node will be operating in adefined state. The whole idea of fencing is to put a node that is in anunknown state into a known state. To do that, you must be able to fencetotally outside the node itself. If you depend on the node behaving atall, your approach is flawed.

I have found a simpler solution I actively poll to check if the cluster is 
okay.  I would prefer to fire a script  on an event but ..

I'm also looking into why there is a comms problem as its 2 vm's on the same 
host on the same network, I think its starvation of cpu cycles as it’s a dev 
setup.


Why things went wrong is entirely secondary to fencing.

--
Digimer
Papers and Projects: https://alteeve.ca/w/

What if the cure for cancer is trapped in the mind of a person withoutaccess to education?


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] stonith q

Reply via email to