On 04/11/14 02:45 PM, Alex Samad - Yieldbroker wrote:
{snip}
Any pointers to a frame work somewhere ?

I do not think there is any formal stonith agent developers guide;
take at any existing agent like external/ipmi and modify to suite your
needs.

Does fenced have any handlers, I notice it logs a message in syslog and
cluster log is there a chance to capture the event there ?

I do not have experience with RH CMAN, sorry. But from what I
understand fenced and stonithd agents are compatible.

https://fedorahosted.org/cluster/wiki/FenceAgentAPI


Thanks


Note the return codes. Also, not listed there, is the requirement that an
agent print it's XML validation data. You can see example of what this looks
like by calling 'fence_ipmilan -o metadata' (or any other
fence_* agent).

For the record, I think this is a bad idea.

So lots of people have said this is bad idea and maybe I am miss understanding 
something.

 From my observation of my 2 node cluster, when inter cluster comms has an 
issues 1 node kills the other node.
Lets say A + B.
A is currently running the resources, B get elected to die.

Nothing is "selected". Both nodes will initiate a fence, but if you set 'delay="15"' for the node "A" fence method, the node B will pause for 15 seconds before acting on the fence request. If node A saw no delay on node B, it will immediately proceed with the fence action. In this way, node A will always be faster than node B, so node B will always lose in a fence race like this.

A signal is sent cman -> PK -> stonithd

Correct (basically).

 From the logs on server B I see fenced trying to kill server B, but I don't 
use any cman/stonith agents. I would like to capture that event and use a OS 
reboot.

Then use a fabric fence method. These are ones where the network connection(s) to the target node is(are) severed. Thus, node B will sit there perpetually trying to fence node A, but failing because it can't talk to it's fence device (network switch, etc). Then a human can come in, examine the system, reboot the node and unfence the node once it has rebooted, restoring network connections.

I created a proof of concept fence agent doing this with D-Link switches:

https://github.com/digimer/fence_dlink_snmp

It should be easy enough to adapt to, say, call the hypervisor/host and using brctl to detach the virtual interfaces to the VM.

Or, more easily, stick with power fencing and use an external log server.

So the problem I perceive is if server B is in a state where it can't run OS 
locked up or crashed. I believe VMware will look after that, from experience I 
have seen it deal with that

I'm not sure I understand... I don't use VMWare, so maybe I am missing something. If the node stops all processing, then it's possible the node will be detected as faulty and will be rebooted. However, there are many ways that nodes can fail. Secondly, unless something tells pacemaker that the node is dead, it won't know and is not allowed to assume.

The issue is  if  B is running enough to still have a VIP (one of the resources 
that PK looks after) is still on B and A and B can't or will not shutdown via 
the OS. I understand that, but I would like still attempt to reboot at that time

You're mistake here is assuming that the node will be operating in a defined state. The whole idea of fencing is to put a node that is in an unknown state into a known state. To do that, you must be able to fence totally outside the node itself. If you depend on the node behaving at all, your approach is flawed.

I have found a simpler solution I actively poll to check if the cluster is 
okay.  I would prefer to fire a script  on an event but ..

I'm also looking into why there is a comms problem as its 2 vm's on the same 
host on the same network, I think its starvation of cpu cycles as it’s a dev 
setup.

Why things went wrong is entirely secondary to fencing.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to