On 30/07/2013, at 9:13 PM, Rainer Brestan <rainer.bres...@gmx.net> wrote:
> I can agree, Master monitor operation is broken in 1.1.10 release. > When the slave monitor action is started, the master monitor action is not > called any more. > > I have created a setup with Stateful resource with two nodes. > Then the Pacemaker installation is changed to different versions without > changing the configuration part of the CIB. > > Result: > 1.1.10-rc5, 1.1.10-rc6 and 1.1.10-rc7 does not have this error > 1.1.10-1 release has the error Thats bizarre because: [09:48 AM] beekhof@f17 ~/Development/sources/pacemaker/devel ☺ # git log Pacemaker-1.1.10-rc7..Pacemaker-1.1.10 pengine lib/pengine [09:48 AM] beekhof@f17 ~/Development/sources/pacemaker/devel ☺ # /me starts investigating > > Installation order (just that anybody know how it was done): > 1.1.10-1 -> error > 1.1.10-rc5 -> no error > 1.1.10-rc6 -> no error > 1.1.10-rc7 -> no error > 1.1.10-1 -> error > > Rainer > Gesendet: Freitag, 26. Juli 2013 um 09:32 Uhr > Von: "Takatoshi MATSUO" <matsuo....@gmail.com> > An: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> > Betreff: Re: [Pacemaker] Announce: Pacemaker 1.1.10 now available > Hi > > I used Stateful RA and caught a same issue. > > 1. before starting slave > > # crm_simulate -VVV -S -x /var/lib/pacemaker/pengine/pe-input-1543.bz2 > | grep "Resource action" > * Resource action: stateful monitor=2000 on 16-sl6 > > 2. starting slave > # crm_simulate -VVV -S -x /var/lib/pacemaker/pengine/pe-input-1544.bz2 > | grep "Resource action" > * Resource action: stateful monitor on 17-sl6 > * Resource action: stateful notify on 16-sl6 > * Resource action: stateful start on 17-sl6 > * Resource action: stateful notify on 16-sl6 > * Resource action: stateful notify on 17-sl6 > * Resource action: stateful monitor=3000 on 17-sl6 > > 3. after > # crm_simulate -VVV -S -x /var/lib/pacemaker/pengine/pe-input-1545.bz2 > | grep "Resource action" > * Resource action: stateful monitor=3000 on 17-sl6 > > Monitor=2000 is deleted. > Is this correct ? > > > My setting > -------- > property \ > no-quorum-policy="ignore" \ > stonith-enabled="false" > > rsc_defaults \ > resource-stickiness="INFINITY" \ > migration-threshold="1" > > ms msStateful stateful \ > meta \ > master-max="1" \ > master-node-max="1" \ > clone-max="2" \ > clone-node-max="1" \ > notify="true" > > primitive stateful ocf:heartbeat:Stateful \ > op start timeout="60s" interval="0s" on-fail="restart" \ > op monitor timeout="60s" interval="3s" on-fail="restart" \ > op monitor timeout="60s" interval="2s" on-fail="restart" role="Master" \ > op promote timeout="60s" interval="0s" on-fail="restart" \ > op demote timeout="60s" interval="0s" on-fail="stop" \ > op stop timeout="60s" interval="0s" on-fail="block" > -------- > > Regards, > Takatoshi MATSUO > > 2013/7/26 Takatoshi MATSUO <matsuo....@gmail.com>: > > Hi > > > > My report is late for 1.1.10 :( > > > > I am using pacemaker 1.1.10-0.1.ab2e209.git. > > It seems that master's monitor is stopped when slave is started. > > > > Does someone encounter same problem ? > > I attach a log and settings. > > > > > > Thanks, > > Takatoshi MATSUO > > > > 2013/7/26 Digimer <li...@alteeve.ca>: > >> Congrats!! I know this was a long time in the making. > >> > >> digimer > >> > >> > >> On 25/07/13 20:43, Andrew Beekhof wrote: > >>> > >>> Announcing the release of Pacemaker 1.1.10 > >>> > >>> https://github.com/ClusterLabs/pacemaker/releases/Pacemaker-1.1.10 > >>> > >>> There were three changes of note since rc7: > >>> > >>> + Bug cl#5161 - crmd: Prevent memory leak in operation cache > >>> + cib: Correctly read back archived configurations if the primary is > >>> corrupted > >>> + cman: Do not pretend we know the state of nodes we've never seen > >>> > >>> Along with assorted bug fixes, the major topics for this release were: > >>> > >>> - stonithd fixes > >>> - fixing memory leaks, often caused by incorrect use of glib reference > >>> counting > >>> - supportability improvements (code cleanup and deduplication, > >>> standardized error codes) > >>> > >>> Release candidates for the next Pacemaker release (1.1.11) can be > >>> expected some time around Novemeber. > >>> > >>> A big thankyou to everyone that spent time testing the release > >>> candidates and/or contributed patches. However now that Pacemaker is > >>> perfect, anyone reporting bugs will be shot :-) > >>> > >>> To build `rpm` packages: > >>> > >>> 1. Clone the current sources: > >>> > >>> # git clone --depth 0 git://github.com/ClusterLabs/pacemaker.git > >>> # cd pacemaker > >>> > >>> 1. Install dependancies (if you haven't already) > >>> > >>> [Fedora] # sudo yum install -y yum-utils > >>> [ALL] # make rpm-dep > >>> > >>> 1. Build Pacemaker > >>> > >>> # make release > >>> > >>> 1. Copy and deploy as needed > >>> > >>> ## Details - 1.1.10 - final > >>> > >>> Changesets: 602 > >>> Diff: 143 files changed, 8162 insertions(+), 5159 deletions(-) > >>> > >>> ## Highlights > >>> > >>> ### Features added since Pacemaker-1.1.9 > >>> > >>> + Core: Convert all exit codes to positive errno values > >>> + crm_error: Add the ability to list and print error symbols > >>> + crm_resource: Allow individual resources to be reprobed > >>> + crm_resource: Allow options to be set recursively > >>> + crm_resource: Implement --ban for moving resources away from nodes > >>> and --clear (replaces --unmove) > >>> + crm_resource: Support OCF tracing when using > >>> --force-(check|start|stop) > >>> + PE: Allow active nodes in our current membership to be fenced without > >>> quorum > >>> + PE: Suppress meaningless IDs when displaying anonymous clone status > >>> + Turn off auto-respawning of systemd services when the cluster starts > >>> them > >>> + Bug cl#5128 - pengine: Support maintenance mode for a single node > >>> > >>> ### Changes since Pacemaker-1.1.9 > >>> > >>> + crmd: cib: stonithd: Memory leaks resolved and improved use of glib > >>> reference counting > >>> + attrd: Fixes deleted attributes during dc election > >>> + Bug cf#5153 - Correctly display clone failcounts in crm_mon > >>> + Bug cl#5133 - pengine: Correctly observe on-fail=block for failed > >>> demote operation > >>> + Bug cl#5148 - legacy: Correctly remove a node that used to have a > >>> different nodeid > >>> + Bug cl#5151 - Ensure node names are consistently compared without > >>> case > >>> + Bug cl#5152 - crmd: Correctly clean up fenced nodes during membership > >>> changes > >>> + Bug cl#5154 - Do not expire failures when on-fail=block is present > >>> + Bug cl#5155 - pengine: Block the stop of resources if any depending > >>> resource is unmanaged > >>> + Bug cl#5157 - Allow migration in the absence of some colocation > >>> constraints > >>> + Bug cl#5161 - crmd: Prevent memory leak in operation cache > >>> + Bug cl#5164 - crmd: Fixes crash when using pacemaker-remote > >>> + Bug cl#5164 - pengine: Fixes segfault when calculating transition > >>> with remote-nodes. > >>> + Bug cl#5167 - crm_mon: Only print "stopped" node list for incomplete > >>> clone sets > >>> + Bug cl#5168 - Prevent clones from being bounced around the cluster > >>> due to location constraints > >>> + Bug cl#5170 - Correctly support on-fail=block for clones > >>> + cib: Correctly read back archived configurations if the primary is > >>> corrupted > >>> + cib: The result is not valid when diffs fail to apply cleanly for CLI > >>> tools > >>> + cib: Restore the ability to embed comments in the configuration > >>> + cluster: Detect and warn about node names with capitals > >>> + cman: Do not pretend we know the state of nodes we've never seen > >>> + cman: Do not unconditionally start cman if it is already running > >>> + cman: Support non-blocking CPG calls > >>> + Core: Ensure the blackbox is saved on abnormal program termination > >>> + corosync: Detect the loss of members for which we only know the > >>> nodeid > >>> + corosync: Do not pretend we know the state of nodes we've never seen > >>> + corosync: Ensure removed peers are erased from all caches > >>> + corosync: Nodes that can persist in sending CPG messages must be > >>> alive afterall > >>> + crmd: Do not get stuck in S_POLICY_ENGINE if a node we couldn't fence > >>> returns > >>> + crmd: Do not update fail-count and last-failure for old failures > >>> + crmd: Ensure all membership operations can complete while trying to > >>> cancel a transition > >>> + crmd: Ensure operations for cleaned up resources don't block recovery > >>> + crmd: Ensure we return to a stable state if there have been too many > >>> fencing failures > >>> + crmd: Initiate node shutdown if another node claims to have > >>> successfully fenced us > >>> + crmd: Prevent messages for remote crmd clients from being relayed to > >>> wrong daemons > >>> + crmd: Properly handle recurring monitor operations for remote-node > >>> agent > >>> + crmd: Store last-run and last-rc-change for all operations > >>> + crm_mon: Ensure stale pid files are updated when a new process is > >>> started > >>> + crm_report: Correctly collect logs when 'uname -n' reports fully > >>> qualified names > >>> + fencing: Fail the operation once all peers have been exhausted > >>> + fencing: Restore the ability to manually confirm that fencing > >>> completed > >>> + ipc: Allow unpriviliged clients to clean up after server failures > >>> + ipc: Restore the ability for members of the haclient group to connect > >>> to the cluster > >>> + legacy: Support "crm_node --remove" with a node name for corosync > >>> plugin (bnc#805278) > >>> + lrmd: Default to the upstream location for resource agent scratch > >>> directory > >>> + lrmd: Pass errors from lsb metadata generation back to the caller > >>> + pengine: Correctly handle resources that recover before we operate on > >>> them > >>> + pengine: Delete the old resource state on every node whenever the > >>> resource type is changed > >>> + pengine: Detect constraints with inappropriate actions (ie. promote > >>> for a clone) > >>> + pengine: Ensure per-node resource parameters are used during probes > >>> + pengine: If fencing is unavailable or disabled, block further > >>> recovery for resources that fail to stop > >>> + pengine: Implement the rest of get_timet_now() and rename to > >>> get_effective_time > >>> + pengine: Re-initiate _active_ recurring monitors that previously > >>> failed but have timed out > >>> + remote: Workaround for inconsistent tls handshake behavior between > >>> gnutls versions > >>> + systemd: Ensure we get shut down correctly by systemd > >>> + systemd: Reload systemd after adding/removing override files for > >>> cluster services > >>> + xml: Check for and replace non-printing characters with their octal > >>> equivalent while exporting xml text > >>> + xml: Prevent lockups by setting a more reliable buffer allocation > >>> strategy > >>> > >>> > >>> _______________________________________________ > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: http://bugs.clusterlabs.org > >>> > >> > >> > >> -- > >> Digimer > >> Papers and Projects: https://alteeve.ca/w/ > >> What if the cure for cancer is trapped in the mind of a person without > >> access to education? > >> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org