Re: [Pacemaker] Patches: RFC before pull request

2015-01-07 Thread Andrew Beekhof
They all look sane to me. Please proceed with a pull request :-) We should probably start thinking about .13 (or .14 for the superstitious), there have been quite a few important patches arrive since .12 was released. > On 10 Dec 2014, at 1:33 am, Lars Ellenberg wrote: > > > Andrew, > All,

Re: [Pacemaker] qb_ipcs_disconnect message in corosync cluster

2015-01-07 Thread Bharathiraja P
Thanks Andrew. I upgraded corosync and pacemaker and the cluster works fine now. On Thu, Jan 8, 2015 at 8:26 AM, Andrew Beekhof wrote: > > > On 15 Dec 2014, at 4:29 pm, Bharathiraja P wrote: > > > > Hi Andrew, > > > > Frequently one node gets disconnected from CIB and stops the cluster > resou

Re: [Pacemaker] More Diagnosis help

2015-01-07 Thread Andrew Beekhof
> On 1 Nov 2014, at 7:07 am, Alex Samad - Yieldbroker > wrote: > > It looks to me, like VMWare took too long to give this vm a time slice and > corosync responded by killing one node That does sound reasonable from the logs you posted. (sorry, I'm only just catching up on old posts) _

Re: [Pacemaker] pgsql troubles.

2015-01-07 Thread Andrew Beekhof
> On 5 Dec 2014, at 4:16 am, steve wrote: > > Good Afternoon, > > > I am having loads of trouble with pacemaker/corosync/postgres. Defining the > symptoms is rather difficult. The primary being that postgres starts as > slave on both nodes. I have tested the pgsqlRA start/stop/status/mon

Re: [Pacemaker] OpenVZ live migration

2015-01-07 Thread Andrew Beekhof
> On 16 Dec 2014, at 7:33 am, mailing list wrote: [...] > When I shutdown the nodea that the resource get on the nodeb, bad when I > switch nodea again on, that the cluster kill the virtual and start it on the > nodea. It's here a Idea how to allow live migrate on the cluster? Could be a lim

Re: [Pacemaker] Long failover

2015-01-07 Thread Andrew Beekhof
I need to see logs from both nodes that relate to the same instance of the issue. Why are the dates so crazy? One is from a year ago and the other is in the (at the time) future. > On 2 Dec 2014, at 7:04 pm, Dmitry Matveichev wrote: > > Hello, > Any thoughts about this issue? It still affects

Re: [Pacemaker] Split Brain on DRBD Dual Primary

2015-01-07 Thread Andrew Beekhof
> On 12 Nov 2014, at 5:16 pm, Ho, Alamsyah - ACE Life Indonesia > wrote: > > Hi All, > > On October archives, I saw the issue reported by Felix Zachlod on > http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022653.html and > the same is actually happens to me now on dual primary D

Re: [Pacemaker] qb_ipcs_disconnect message in corosync cluster

2015-01-07 Thread Andrew Beekhof
> On 15 Dec 2014, at 4:29 pm, Bharathiraja P wrote: > > Hi Andrew, > > Frequently one node gets disconnected from CIB and stops the cluster > resources. I'm not able to start or cleanup failed actions for any of the > resources. For ex, if nodeA gets disconnected from CIB, I won't be able to

Re: [Pacemaker] Clustermon issue

2015-01-07 Thread Andrew Beekhof
> On 8 Jan 2015, at 1:31 pm, Andrew Beekhof wrote: > > And there is no indication this is being called? Doh. I know this one... you're actually using 1.1.12-rc3. You need this patch which landed after 1.1.12 shipped: https://github.com/beekhof/pacemaker/commit/3df6aff > >> On 7 Jan 2015,

Re: [Pacemaker] Clustermon issue

2015-01-07 Thread Andrew Beekhof
And there is no indication this is being called? > On 7 Jan 2015, at 6:21 pm, Marco Querci wrote: > > #!/bin/bash > > monitorfile=/tmp/clustermonitor.html > hostname=$(hostname) > > echo "Cluster state changes detected" | mail -r "$hostname@" -s > "Cluster Monitor" -a $monitorfile mquerc...@g

Re: [Pacemaker] Corosync 1.4.7: zombie (defunct)

2015-01-07 Thread Sergey Arlashin
Sorry, my fault. Forgot to include /usr/lib/lcrso/pacemaker.lcrso in my deb package. -- Best regards, Sergey Arlashin On Jan 7, 2015, at 2:18 PM, Sergey Arlashin wrote: > After installing 1.1.12 on one of my nodes in staging environment I see the > following error in corosync.log > > Jan

[Pacemaker] announcement: schedule for resource-agents release 3.9.6

2015-01-07 Thread Dejan Muhamedagic
Hello, This is a tentative schedule for resource-agents v3.9.6: 3.9.6-rc1: January 16. 3.9.6: January 23. Let's hope that this time the schedule will work out ;-) I modified the corresponding milestones at https://github.com/ClusterLabs/resource-agents If there's anything you think should be pa

Re: [Pacemaker] crm_mon and last-failure timestamp not always present

2015-01-07 Thread emmanuel segura
if you want to see the time stamp of your resource operations without failure-timeout, because is used for reset the failcount, you can use the "-t" option: crm_mon -Arf1t .. .. Operations: * Node node01: sambaip: migration-threshold=100 + (30) start: rc=0 (ok) Dumm

Re: [Pacemaker] crm_mon and last-failure timestamp not always present

2015-01-07 Thread Gianluca Cecchi
On Wed, Jan 7, 2015 at 10:21 AM, Gianluca Cecchi wrote: [snip] > > Is there any parameter inside configuration of cluster and/or resources > that could control if "last-failure" information will be shown or not? > > > Ok, solved. On the cluster where timestamp is shown the resource has the meta

Re: [Pacemaker] Problems with SBD

2015-01-07 Thread Lars Marowsky-Bree
On 2015-01-04T19:49:58, Oriol Mula-Valls wrote: > I have a two node system with SLES 11 SP3 (pacemaker-1.1.9-0.19.102, > corosync-1.4.5-0.18.15, sbd-1.1-0.13.153). Since desember we started to > have several reboots of the system due to SBD; 22nd, 24th and 26th. Last > reboot happened yesterday J

Re: [Pacemaker] Corosync 1.4.7: zombie (defunct)

2015-01-07 Thread Sergey Arlashin
After installing 1.1.12 on one of my nodes in staging environment I see the following error in corosync.log Jan 7 10:05:30 lb-node1 corosync[17022]: [SERV ] Service failed to load 'pacemaker'. and also cannot get crm_mon to show any info. # crm_mon -1 Connection to cluster failed: Transpo

[Pacemaker] crm_mon and last-failure timestamp not always present

2015-01-07 Thread Gianluca Cecchi
Hello, I have two old SLES 11 SP2 clusters, each one composed by two nodes, that I'm comparing. They should have same package versions (I' going to verify better this point and configuration files when access will be granted to me) but on one where there is a group configured with mysqld, when the