On 09/18/2013 06:49 PM, Andrew Beekhof wrote:
On 19/09/2013, at 8:25 AM, David Lang <da...@lang.hm> wrote:

What's the best way to see what it's getting stuck doing?
Log files.

Is there a good way to tell if this is a pacemaker or corosync problem (so I 
can drop one of the lists from the thread)?
Not without further information


We've had the same problem here, trying to get HA dns/named service working. Works great for a day or so, then seizes up, simple commands like `crm_standby -v true` timeout after 120 seconds, etc. We're testing for release, and keep running into issues like this. At first we suspected firewall issues, but even after confirmed operation and several hand-offs of HA services back and forth, it still dies within a day or so.

We're on CentOS 6/64 with yum packages augmented from http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/RedHat_RHEL-6/
with exclude=pacemaker* corosync*

In order to make the log files visible, I've snipped out a time period during which it becomes unresponsive visible at http://hal.schoolpathways.com/details/

I don't know the exact moment, this is a test cluster and not being monitored by a netmon. Any other details I could provide that would be useful/helpful?



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to