On 09/18/2013 06:49 PM, Andrew Beekhof wrote:
On 19/09/2013, at 8:25 AM, David Lang <da...@lang.hm> wrote:
What's the best way to see what it's getting stuck doing?
Log files.
Is there a good way to tell if this is a pacemaker or corosync problem (so I
can drop one of the lists from the thread)?
Not without further information
We've had the same problem here, trying to get HA dns/named service
working. Works great for a day or so, then seizes up, simple commands
like `crm_standby -v true` timeout after 120 seconds, etc. We're testing
for release, and keep running into issues like this. At first we
suspected firewall issues, but even after confirmed operation and
several hand-offs of HA services back and forth, it still dies within a
day or so.
We're on CentOS 6/64 with yum packages augmented from
http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/RedHat_RHEL-6/
with exclude=pacemaker* corosync*
In order to make the log files visible, I've snipped out a time period
during which it becomes unresponsive visible at
http://hal.schoolpathways.com/details/
I don't know the exact moment, this is a test cluster and not being
monitored by a netmon. Any other details I could provide that would be
useful/helpful?
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org