[Pacemaker] After reboot, node does not an automatically rejoin

2012-07-19 Thread Tom Tux
Hi When I reboot one of our two-node-cluster-boxes (sles11 sp1, fully patched, HAE installed, the node does not rejoin himself to the cluster. I got the following error: corosync[5377]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2) corosync[53

Re: [Pacemaker] "crm status" cmd not running through cron job

2011-10-29 Thread Tom Tux
Looks like a environment-(path)-problem. Can you set the same paths in the script as you have as user "root"? Do this with "export PATH=$PATH:/usr/sbin." Regards, Tom 2011/10/28 ihjaz Mohamed : > Hi All, > Am facing a strange issue. When I schedule a script to be run via cron, the > 'crm sta

Re: [Pacemaker] Reboot node with stonith after killing a corosync-process?

2011-04-15 Thread Tom Tux
OK, I understand. Many thanks for your help. 2011/4/15 Andrew Beekhof : > On Fri, Apr 15, 2011 at 12:09 PM, Dominik Klein wrote: >> Hi >> >> On 04/15/2011 09:05 AM, Tom Tux wrote: >>> I can reproduce this behavior: >>> >>> - On node02, whic

Re: [Pacemaker] Reboot node with stonith after killing a corosync-process?

2011-04-15 Thread Tom Tux
maker-1.1.5-5.5.5 pacemaker-mgmt-client-2.0.0-0.5.5 Thanks a lot. Tom 2011/4/15 Andrew Beekhof : > Impossible to say without logs.  Sounds strange though. > > On Fri, Apr 15, 2011 at 7:17 AM, Tom Tux wrote: >> Hi >> >> I have a two node cluster (stonith enabled). On one n

[Pacemaker] Reboot node with stonith after killing a corosync-process?

2011-04-14 Thread Tom Tux
Hi I have a two node cluster (stonith enabled). On one node I tried stopping openais (/etc/init.d/openais stop), but this was hanging. So I killed all running corosync processes (killall -9 corosync). Afterward, I started openais on this node again (rcopenais start). After a few seconds, this node

Re: [Pacemaker] Node doesn't rejoin automatically after reboot

2011-01-13 Thread Tom Tux
e solution to this issue? > > Thanks, > Bob Haxo > > On Mon, 2010-09-06 at 09:50 +0200, Tom Tux wrote: > > Yes, corosync is running after the reboot. It comes up with the > regular init-procedure (runlevel 3 in my case). > > 2010/9/6 Andrew Beekhof : >> On Mon, Sep 6

Re: [Pacemaker] Node doesn't rejoin automatically after reboot

2010-09-06 Thread Tom Tux
Yes, corosync is running after the reboot. It comes up with the regular init-procedure (runlevel 3 in my case). 2010/9/6 Andrew Beekhof : > On Mon, Sep 6, 2010 at 7:57 AM, Tom Tux wrote: >> No, I don't have such failed-messages. In my case, the "Connection to >> our AI

Re: [Pacemaker] Node doesn't rejoin automatically after reboot

2010-09-05 Thread Tom Tux
No, I don't have such failed-messages. In my case, the "Connection to our AIS plugin" was established. The /dev/shm is also not full. Kind regards, Tom 2010/9/3 Michael Smith : > Tom Tux wrote: > >> If I disjoin one clusternode (node01) for maintenance-purposes >&g

[Pacemaker] Node doesn't rejoin automatically after reboot

2010-09-02 Thread Tom Tux
Hi If I disjoin one clusternode (node01) for maintenance-purposes (/etc/init.d/openais stop) and reboot this node, then it will not join himself automatically into the cluster. After the reboot, I have the following error- and warn-messages in the log: Sep 3 07:34:09 node01 mgmtd: [9201]: ERROR:

Re: [Pacemaker] Resource monitoring stops suddenly

2010-04-28 Thread Tom Tux
Hi Michael I also use SLES11-HAE and I had the same problems. I opened a case by Novell and they sent me a PTF (Program Temporary Fix), which solved this problem. The versions of some pacemaker-/openais-binaries (which are distributed by novell's-hae) aren't up-to-date and therefore - this is a st

Re: [Pacemaker] crm_mon SMTP alerts

2010-04-06 Thread Tom Tux
Hi Per default, the crm_mon takes the tcp port 587. You have to specify the tcp-port 25 (smtp) like this: /usr/sbin/crm_mon -d --mail-host=smtpserver.example.org:25 --mail-to=recipi...@example.org --mail-from=spam...@example.org --mail-prefix=Cluster-Event on Node1 Hope, this helps. Kind regards

Re: [Pacemaker] [SOLVED] Resource-Monitoring with an "On Fail"-Action

2010-03-31 Thread Tom Tux
t;0" back and the failcount will be increased. Is this behaviour correct? Thanks a lot for your help. Kind regards, Tom 2010/3/19 Tom Tux : > Hi > > Thanks a lot for your help. > > So now it's Novell's turn.:-) > > Regards, > Tom > > > 2010/3/

Re: [Pacemaker] [SOLVED] Resource-Monitoring with an "On Fail"-Action

2010-03-18 Thread Tom Tux
Hi Thanks a lot for your help. So now it's Novell's turn.:-) Regards, Tom 2010/3/18 Dejan Muhamedagic : > Hi, > > On Thu, Mar 18, 2010 at 02:15:07PM +0100, Tom Tux wrote: >> Hi Dejan >> >> hb_report -V says: >> cluster-glue: 1.0.2 (b75bd7

Re: [Pacemaker] Resource-Monitoring with an "On Fail"-Action

2010-03-18 Thread Tom Tux
1:09:06 2010' exec-time=4080ms queue-time=0ms rc=0 (ok) + (41) monitor: interval=2ms last-rc-change='Wed Mar 17 11:09:10 2010' last-run='Wed Mar 17 11:09:10 2010' exec-time=20ms queue-time=0ms rc=0 (ok) And the results above was yesterday Thanks for your

Re: [Pacemaker] Resource-Monitoring with an "On Fail"-Action

2010-03-17 Thread Tom Tux
Or do you have experience, upgrading the cluster-glue from source (even if it is installed with zypper/rpm)? Do you know, when the HAE-Repository will be upgraded? Thanks a lot. Tom 2010/3/17 Dejan Muhamedagic : > Hi, > > On Wed, Mar 17, 2010 at 10:57:16AM +0100, Tom Tux wrote: >>

Re: [Pacemaker] Resource-Monitoring with an "On Fail"-Action

2010-03-17 Thread Tom Tux
ee whether the monitor op really returns > 99. (grep for the resource-id). If so, I'm not sure what the cluster > does with rc=99. As far as I know, rc=4 would be status=failed (unknown > actually). > > Regards > Dominik > > Tom Tux wrote: >> Thanks fo

Re: [Pacemaker] Resource-Monitoring with an "On Fail"-Action

2010-03-17 Thread Tom Tux
orAgent_Resource 100 node2 10003 I also saw, that the "last-run"-entry (crm_mon -fort1) for this resource is not up-to-date. For me it seems, that the monitor-action does not occurs every 10 seconds. Why? Any hints for this behaviour? Thanks a lot. Tom

[Pacemaker] Resource-Monitoring with an "On Fail"-Action

2010-03-16 Thread Tom Tux
Hi I've have a question about the resource-monitoring: I'm monitoring an ip-resource every 20 seconds. I have configured the "On Fail"-action with "restart". This works fine. If the "monitor"-operation fails, then the resource will be restartet. But how can I define this resource, to migrate to t

Re: [Pacemaker] [SOLVED] snmp-subagent with openais returns only LHAResource-entries

2010-03-09 Thread Tom Tux
Hi Michael Thanks for your answer. I only now recognised, that this behaviour is described in your book on page 325. Kind regards, Tom 2010/3/9 Michael Schwartzkopff : > Am Dienstag, 9. März 2010 09:43:48 schrieb Tom Tux: >> Hi all, >> >> I'm trying to get several

[Pacemaker] snmp-subagent with openais returns only LHAResource-entries

2010-03-09 Thread Tom Tux
Hi all, I'm trying to get several informations with the snmp-agent (hbagent) on a sles11-system (openais & pacemaker). I've defined the global environment-variable "HA_cluster_type=openais". My problem is, that I only receive snmp-values (with snmpwalk) from the oid "enterprises.4682". This means

Re: [Pacemaker] [SOLVED] crm_mon with sending mails

2010-02-22 Thread Tom Tux
the reason. > > Hope this helps. > > Cheers, > Florian > > > On 2010-02-22 10:36, Tom Tux wrote: >> Hi all, >> >> I started "crm_mon" in the background to send me mails when a >> cluster-event occurs: >> crm_mon -d -H [smtp-server] -T [my_mail

[Pacemaker] crm_mon with sending mails

2010-02-22 Thread Tom Tux
Hi all, I started "crm_mon" in the background to send me mails when a cluster-event occurs: crm_mon -d -H [smtp-server] -T [my_mailaddress] But when I manually stop or start a resource or take one node offline and online again, I did not receive an email. My question hereby: On which cluster-eve