[Pacemaker] Suggestions for managing HA of containers from within a Pacemaker container?

2015-02-07 Thread Steven Dake (stdake)
Hi, I am working on Containerizing OpenStack in the Kolla project (http://launchpad.net/kolla). One of the key things we want to do over the next few months is add H/A support to our container tech. David Vossel had suggested using systemctl to monitor the containers themselves by running he

Re: [Pacemaker] Notification when a node is down

2014-09-15 Thread Steven Hale
On 15 September 2014 16:49, David Vossel wrote: > > This might be a useful reference. > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm207039249856 I've been having trouble with this too, and I spent ages on the above link trying to make it work

Re: [Pacemaker] KVM live migration with pcs

2014-09-12 Thread Steven Hale
On 12 September 2014 12:34, Steven Hale wrote: > # pcs resource move vm-resource newnode Additionally, you must have the "allow-migrate=true" meta option set if you want to use live migration. Otherwise it will shutdown the resource and restart it on the new node, just like any

Re: [Pacemaker] KVM live migration with pcs

2014-09-12 Thread Steven Hale
On 12 September 2014 12:24, Саша Александров wrote: > Is it possible to live migrate KVM with pcs? There is no 'pcs resource > migrate'. VirtualDomain RA has migrate_to/migrate_from functions, but what > is the method for cluster to call them? The command is "move" rather than "migrate". move

[Pacemaker] Fwd: VirtualDomain broken for live migration.

2014-08-18 Thread Steven Hale
Dear all, I'm in the process of setting up my first four-node cluster. I'm using CentOS7 with PCS/Pacemaker/Corosync. I've got everything set up with shared storage using GlusterFS. The cluster is running and I'm in the process of adding resources. My intention for the cluster is to use it to

[Pacemaker] Typo? pcs cluster push question....

2013-12-19 Thread Steven Silk - NOAA Affiliate
-push stonith_cfg Or am I working with a different version of pcs? thanks! Steven Silk 303 497 3112 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org

Re: [Pacemaker] Need to relax corosync due to backup of VM through snapshot

2013-11-24 Thread Steven Dake
On 11/21/2013 06:26 AM, Gianluca Cecchi wrote: On Thu, Nov 21, 2013 at 9:09 AM, Lars Marowsky-Bree wrote: On 2013-11-20T16:58:01, Gianluca Cecchi wrote: Based on docs I thought that the timeout should be token x token_retransmits_before_loss_const No, the comments in the corosync.conf.exa

Re: [Pacemaker] Building corosync from source on Angstrom

2013-05-31 Thread Steven Dake
On 05/31/2013 12:57 PM, Simon Platten wrote: Hi, I have been struggling to build corosync in Angstrom Linux on a beaglebone black which runs an ARM Cortex A8. I have been using this page as a guide: http://clusterlabs.org/wiki/SourceInstall So far I've downloaded and built libqb, no problems t

Re: [Pacemaker] ClusterMon Resource starting multiple instances of crm_mon

2013-05-10 Thread Steven Bambling
On May 10, 2013, at 5:35 AM, Steven Bambling wrote: > > On May 9, 2013, at 8:05 PM, Andrew Beekhof wrote: > >> >> On 10/05/2013, at 12:40 AM, Steven Bambling wrote: >> >>> I'm having some issues with getting some cluster monitoring setup an

Re: [Pacemaker] ClusterMon Resource starting multiple instances of crm_mon

2013-05-10 Thread Steven Bambling
On May 9, 2013, at 8:05 PM, Andrew Beekhof wrote: > > On 10/05/2013, at 12:40 AM, Steven Bambling wrote: > >> I'm having some issues with getting some cluster monitoring setup and >> configured on a 3 node multi-state cluster. I'm using Florian&#x

[Pacemaker] ClusterMon Resource starting multiple instances of crm_mon

2013-05-09 Thread Steven Bambling
I'm having some issues with getting some cluster monitoring setup and configured on a 3 node multi-state cluster. I'm using Florian's blog as an example http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/. When I create the primitiv

Re: [Pacemaker] PGSQL resource promotion issue

2013-04-04 Thread Steven Bambling
uck" connection. v/r STEVE On Apr 4, 2013, at 8:08 AM, Takatoshi MATSUO wrote: > Hi Steven > > I made a patch as a trial. > https://github.com/t-matsuo/resource-agents/commit/bd3b587c6665c4f5eba0491b91f83965e601bb6b#heartbeat/pgsql > > This patch never show "STR

Re: [Pacemaker] PGSQL resource promotion issue

2013-04-04 Thread Steven Bambling
On Apr 4, 2013, at 8:08 AM, Takatoshi MATSUO mailto:matsuo@gmail.com>> wrote: Hi Steven I made a patch as a trial. https://github.com/t-matsuo/resource-agents/commit/bd3b587c6665c4f5eba0491b91f83965e601bb6b#heartbeat/pgsql This patch never show "STREAMING|POTENTIAL". T

Re: [Pacemaker] PGSQL resource promotion issue

2013-03-29 Thread Steven Bambling
thest head or near furthest ahead log location and the LESS replay lag. Does this even seem possible with a resource agent or is my thinking totally off? v/r STEVE On Mar 29, 2013, at 8:35 AM, Takatoshi MATSUO wrote: > Hi Steven > > 2013/3/29 Steven Bambling : >> >>

Re: [Pacemaker] PGSQL resource promotion issue

2013-03-29 Thread Steven Bambling
nd compares them to get the highest. If the highest is this list is the own one, it sets the master-score to 1000, on other nodes to 100. Pacemaker then selects the node with the highest master score and promote this. Rainer Gesendet: Mittwoch, 27. März 2013 um 14:37 Uhr Von: "Steven Bamblin

Re: [Pacemaker] OCF Resource agent promote question

2013-03-28 Thread Steven Bambling
a? v/r STEVE On Mar 26, 2013, at 8:19 AM, Steven Bambling mailto:smbambl...@arin.net>> wrote: Excellent thanks so much for the clarification. I'll drop this new RA in and see if I can get things working. STEVE On Mar 26, 2013, at 7:38 AM, Rainer Brestan mailto:rainer.bres..

Re: [Pacemaker] PGSQL resource promotion issue

2013-03-27 Thread Steven Bambling
resource add_operation PGSQL monitor interval=7s v/r STEVE On Mar 27, 2013, at 7:08 AM, Steven Bambling wrote: > > I've built and installed the lastest resource-agents from github on Centos 6 > and configured two resources > > 1 primitive PGVIP: > pc

[Pacemaker] PGSQL resource promotion issue

2013-03-27 Thread Steven Bambling
I've built and installed the lastest resource-agents from github on Centos 6 and configured two resources 1 primitive PGVIP: pcs resource create PGVIP ocf:heartbeat:IPaddr2 ip=10.1.22.48 cidr_netmask=25 op monitor interval=1 Before setting up the PGSQL resource I manually configured sync/stre

Re: [Pacemaker] OCF Resource agent promote question

2013-03-26 Thread Steven Bambling
ienstag, 26. März 2013 um 11:55 Uhr Von: "Steven Bambling" mailto:smbambl...@arin.net>> An: "The Pacemaker cluster resource manager" mailto:pacemaker@oss.clusterlabs.org>> Betreff: Re: [Pacemaker] OCF Resource agent promote question On Mar 26, 2013, at 6:32 AM

Re: [Pacemaker] OCF Resource agent promote question

2013-03-26 Thread Steven Bambling
gt; An: pacemaker@oss.clusterlabs.org<mailto:pacemaker@oss.clusterlabs.org> Betreff: Re: [Pacemaker] OCF Resource agent promote question Hi Steve, On 2013-03-25 18:44, Steven Bambling wrote: > All, > > I'm trying to work on a OCF resource agent that uses postgresql > streaming repli

Re: [Pacemaker] OCF Resource agent promote question

2013-03-26 Thread Steven Bambling
ve misinterpreted the use case of this resource, please let me know. Also any additional hints or corrects would be much appreciated. v/r STEVE On Mar 25, 2013, at 7:01 PM, Andreas Kurz mailto:andr...@hastexo.com>> wrote: Hi Steve, On 2013-03-25 18:44, Steven Bambling wrote: All, I&#

[Pacemaker] OCF Resource agent promote question

2013-03-25 Thread Steven Bambling
All, I'm trying to work on a OCF resource agent that uses postgresql streaming replication. I'm running into a few issues that I hope might be answered or at least some pointers given to steer me in the right direction. 1. A quick way of obtaining a list of "Online" nodes in the cluster that

[Pacemaker] Need HA for OpenStack instances? Check out heat V5!

2012-08-01 Thread Steven Dake
Hi folks, A few developers from HA community have been hard at work on a project called heat which provides native HA for OpenStack virtual machines. Heat provides a template based system with API matching AWS CloudFormation semantics specifically for OpenStack. In v5, instance heatlhchecking has

Re: [Pacemaker] [corosync] Different Corosync Rings for Different Nodes in Same Cluster?

2012-07-08 Thread Steven Dake
loss, STONITH, etc)? > Apologies for delay - was on PTO. That is correct. Regards -steve > Thanks, > > Andrew > > ---- > *From: *"Steven Dake" > *To: *"The Pacemaker cluster resource

Re: [Pacemaker] Different Corosync Rings for Different Nodes in Same Cluster?

2012-06-29 Thread Steven Dake
On 06/29/2012 01:42 AM, Dan Frincu wrote: > Hi, > > On Thu, Jun 28, 2012 at 6:13 PM, Andrew Martin wrote: >> Hi Dan, >> >> Thanks for the help. If I configure the network as I described - ring 0 as >> the network all 3 nodes are on, ring 1 as the network only 2 of the nodes >> are on, and using "

[Pacemaker] If you want High Availability on OpenStack, check out Heat! (details inside)

2012-06-27 Thread Steven Dake
As some may know, Angus and I were working previously on a project called pacemaker-cloud, with the intention of adding high availbility to guests in cloud environments. We stopped developing that project in March 2012 and took our experiences to a new project called Heat. For more details of why

Re: [Pacemaker] nfs running on two nodes w/ drbd corosync pacemaker on CentOS6.2

2012-06-01 Thread Steven Silk
nitor_”Master” not advertised in meta-data, > it may not be supported by the RA > === > > > > > > 2012/6/1 Steven Silk > >> Hello - >> >> I have not had as much time as I would like to work on this. But I &g

Re: [Pacemaker] nfs running on two nodes w/ drbd corosync pacemaker on CentOS6.2

2012-06-01 Thread Steven Silk
byte 0xe2 in position 0: ordinal not in range(128) So now there is a codec problem? Time to reload the corosync? thanks for any suggestions Steve On Tue, May 29, 2012 at 5:42 PM, Steven Silk wrote: > Anton, > > Very good. Since I am setting things up on CentOS, I will be

[Pacemaker] Seems to be working but fails to transition to other node.

2012-05-30 Thread Steven Silk
:02:16 corosync [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 May 31 00:02:16 corosync [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:1858. the example that I am working from talks about doing the following group services fs_drbd0 But this

Re: [Pacemaker] nfs running on two nodes w/ drbd corosync pacemaker on CentOS6.2

2012-05-29 Thread Steven Silk
- stands for linux standard base - I hate it when I see things like this and you can't find where they are spelled out Yours, Steve On Tue, May 29, 2012 at 4:55 PM, Anton Altaparmakov wrote: > Hi Steve, > > On 29 May 2012, at 23:20, Steven Silk wrote: > > Thanks for your qui

Re: [Pacemaker] nfs running on two nodes w/ drbd corosync pacemaker on CentOS6.2

2012-05-29 Thread Steven Silk
On Tue, May 29, 2012 at 2:15 AM, Anton Altaparmakov wrote: > Hi, > > On 28 May 2012, at 23:46, Steven Silk wrote: > > I am trying to setup a two node system making NFS highly available > > We have run this in the past with heartbeat and drbd. Now we would like &

[Pacemaker] nfs running on two nodes w/ drbd corosync pacemaker on CentOS6.2

2012-05-28 Thread Steven Silk
plan? I have stumbled around and gotten the preliminaries set up and working, but when I start configuring the primitives in crm I get tons of errors. thank you -- Steven Silk CSC ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http

Re: [Pacemaker] CentOS pacemaker heartbeat

2012-04-30 Thread Steven Bambling
+1 I'll try to get my notes up with installing and basic setup on 6.2 Sent from my iPhone On Apr 30, 2012, at 11:47 AM, "Andreas Kurz" wrote: > On 04/30/2012 05:37 PM, fatcha...@gmx.de wrote: >> Hi, >> >> I´ve just installed a CentOS 6.2 and also installed via epel-repo >> heartbeat-3.0.4-1.el

Re: [Pacemaker] Pacemaker CoroSync + PGPool-II

2012-04-24 Thread Steven Bambling
24, 2012, at 10:59 AM, Steven Bambling wrote: After doing some searching on setting up "PGPool-HA" to limit pgpool being a single point of failure it looks like development on the heartbeat project has reduced greatly and development has shifted to corosync (backed by RedHat and Suse)

[Pacemaker] Pacemaker CoroSync + PGPool-II

2012-04-24 Thread Steven Bambling
After doing some searching on setting up "PGPool-HA" to limit pgpool being a single point of failure it looks like development on the heartbeat project has reduced greatly and development has shifted to corosync (backed by RedHat and Suse) that is recommend by pacemaker. I've found an article

Re: [Pacemaker] [corosync] Unable to join cluster from a newly-installed centos 6.2 node

2012-03-02 Thread Steven Dake
On 03/02/2012 05:29 PM, Diego Lima wrote: > Hello, > > I've recently installed Corosync on two CentOS 6.2 machines. One is > working fine but on the other machine I've been unable to connect to > the cluster. On the logs I can see this whenever I start > corosync+pacemaker: > > Mar 2 21:33:16 no

Re: [Pacemaker] OCFS2 in Pacemaker, post Corosync 2.0

2012-03-01 Thread Steven Dake
On 03/01/2012 07:19 AM, Lars Marowsky-Bree wrote: > On 2012-03-01T09:52:29, Florian Haas wrote: > >> Future situation (Pacemaker with Corosync 2.x): >> - OpenAIS goes away, no CKPT service, ocfs2_controld.pcmk stops working; >> - cman goes away, ocfs2_controld.cman stops working. >> >> Is that su

[Pacemaker] Does Pacemaker support a 50/50 type high-availability?

2012-01-19 Thread steven
Just to clarify, as I was exhausted last night. I'm looking to setup an active/active cluster withOUT shared storage. I want to use local storage only. Is that possible with Pacemaker? If not, do you know of any software that would help me? ___ Pacem

[Pacemaker] Does Pacemaker support a 50/50 type high-availability?

2012-01-18 Thread steven
Does Pacemaker support an environment where half of the storage is used for fail-over (replication of other servers) and the other half can be used for that server, effectively allowing each and every server in the cluster to be used in a public environment, while having them all contribute to

Re: [Pacemaker] need cluster-wide variables

2012-01-11 Thread Steven Dake
On 12/21/2011 12:01 AM, Nirmala S wrote: > Hi, > > > > This is a followup on earlier thread > (http://www.gossamer-threads.com/lists/linuxha/pacemaker/76705). > > > > My situation is somewhat similar. I need to a cluster which contains 3 > kinds of nodes – master, preferred slave, slave. Pr

[Pacemaker] corosync mailing list address change

2011-10-20 Thread Steven Dake
Sending one last reminder that the Corosync mailing list has changed homes from the Linux Foundation's servers. I have been unable to obtain the previous subscriber list, so please resubscribe. http://lists.corosync.org/mailman/listinfo The list is called "discuss". Regards -steve

Re: [Pacemaker] Questions about reasonable cluster size...

2011-10-20 Thread Steven Dake
On 10/20/2011 07:42 AM, Alan Robertson wrote: > On 10/20/2011 03:11 AM, Proskurin Kirill wrote: >> On 10/20/2011 03:15 AM, Steven Dake wrote: >>> On 10/19/2011 01:50 PM, Alan Robertson wrote: >>>> Hi, >>>> >>>> I have an application where

Re: [Pacemaker] Questions about reasonable cluster size...

2011-10-19 Thread Steven Dake
On 10/19/2011 01:50 PM, Alan Robertson wrote: > Hi, > > I have an application where having a 12-node cluster with about 250 > resources would be desirable. > > Is this reasonable? Can Pacemaker+Corosync be expected to reliably > handle a cluster of this size? > > If not, what is the current rec

[Pacemaker] reminder - new corosync mailng list location

2011-10-10 Thread Steven Dake
A few weeks ago I posted we had moved the corosync mailing list from the linux foundation servers because they are down. Please join the corosync list if your interested in the cluster stack or corosync and ask your questions there. To join: http://lists.corosync.org/mailman/listinfo The list is

[Pacemaker] New Corosync Mailing list - Please register for it!

2011-09-20 Thread Steven Dake
Hi, Over the past several years, we have been sharing a mailing list with the openais project. I have made a new mailing list specifically for corosync: This will be the permanent new list for corosync. Please register at: http://lists.corosync.org/mailman/listinfo The list is called "discuss"

Re: [Pacemaker] Building a Corosync 1.4.1 RPM package for SLES11 SP1

2011-09-01 Thread Steven Dake
On 08/31/2011 11:39 PM, Sebastian Kaps wrote: > Hi, > > I'm trying to compile Corosync v1.4.1 from source[1] and create an RPM > x86_64 package for SLES11 SP1. > When running "make rpm" the build process complains about a broken > dependency for the nss-devel package. > The package is not installe

Re: [Pacemaker] TOTEM: Process pause detected? Leading to STONITH...

2011-08-15 Thread Steven Dake
On 08/12/2011 03:19 AM, Vladislav Bogdanov wrote: > ... >>> I would really like someone that has these process pause problems to >>> test a patch I have posted to see if it rectifies the situation. Our >>> significant QE team at Red Hat doesn't see these problems and I can't >>> generate them in e

Re: [Pacemaker] TOTEM: Process pause detected? Leading to STONITH...

2011-08-11 Thread Steven Dake
On 08/11/2011 03:05 AM, Sebastian Kaps wrote: > Hi, > > On 04.08.2011, at 18:21, Steven Dake wrote: > >>> Jul 31 03:51:02 node01 corosync[5870]: [TOTEM ] Process pause detected >>> for 11149 ms, flushing membership messages. >> >> This process pau

Re: [Pacemaker] Backup ring is marked faulty

2011-08-07 Thread Steven Dake
On 08/04/2011 02:04 PM, Sebastian Kaps wrote: > Hi Steven, > > On 04.08.2011, at 20:59, Steven Dake wrote: > >> meaning the corosync community doesn't investigate redundant ring issues >> prior to corosync versions 1.4.1. > > Sadly, we need to use the SLES

Re: [Pacemaker] Backup ring is marked faulty

2011-08-04 Thread Steven Dake
On 08/04/2011 11:43 AM, Sebastian Kaps wrote: > Hi Steven, > > On 04.08.2011, at 18:27, Steven Dake wrote: > >> redundant ring is only supported upstream in corosync 1.4.1 or later. > > What does "supported" mean in this context, exactly? > meaning the c

Re: [Pacemaker] Backup ring is marked faulty

2011-08-04 Thread Steven Dake
On 08/02/2011 11:53 PM, Sebastian Kaps wrote: > Hi Steven! > > On Tue, 02 Aug 2011 17:45:46 -0700, Steven Dake wrote: >> Which version of corosync? > > # corosync -v > Corosync Cluster Engine, version '1.3.1' > Copyright (c) 2006-2009 Red Hat, Inc. > &

Re: [Pacemaker] Live demo of Pacemaker Cloud on Fedora: Friday August 5th at 8am PST

2011-08-04 Thread Steven Dake
On 08/03/2011 06:39 PM, Bob Schatz wrote: > Steven, > > Are you planning on recording/taping it if I want to watch it later? > > Thanks, > > Bob Bob, Yes I will record if I can beat elluminate into submis

Re: [Pacemaker] Backup ring is marked faulty

2011-08-04 Thread Steven Dake
_gui and hitting "refresh" inside the > GUI 3-5 times. After that ring 1 (10.2.2.0) will be marked as "faulty" again. > > Thanks and best regards, > -Martin Tegtmeier > > > > > -Ursprüngliche Nachricht- > Von: Sebastian Kaps [mailto:seb

Re: [Pacemaker] TOTEM: Process pause detected? Leading to STONITH...

2011-08-04 Thread Steven Dake
On 08/04/2011 05:46 AM, Sebastian Kaps wrote: > Hello, > > here's another problem we're having: > > Jul 31 03:51:02 node01 corosync[5870]: [TOTEM ] Process pause detected > for 11149 ms, flushing membership messages. This process pause message indicates the scheduler doesn't schedule corosync f

[Pacemaker] Live demo of Pacemaker Cloud on Fedora: Friday August 5th at 8am PST

2011-08-03 Thread Steven Dake
Extending a general invitation to the high availability communities and other cloud community contributors to participate in a live demo I am giving on Friday August 5th 8am PST (GMT-7). Demo portion of session is 15 minutes and will be provided first followed by more details of our approach to hi

Re: [Pacemaker] Backup ring is marked faulty

2011-08-02 Thread Steven Dake
Which version of corosync? On 08/02/2011 07:35 AM, Sebastian Kaps wrote: > Hi, > > we're running a two-node cluster with redundant rings. > Ring 0 is a 10 GB direct connection; ring 1 consists of two 1GB > interfaces that are bonded in > active-backup mode and routed through two independent switc

[Pacemaker] Announcing Pacemaker Cloud 0.4.1 - Available now for download!

2011-07-27 Thread Steven Dake
Angus and I announced a project to apply high availability best known practice to the field of cloud computing in late March 2011. We reuse the policy engine of Pacemaker. Our first tarball is available today containing a functional prototype demonstrating these best known practices. Today the s

Re: [Pacemaker] Sending message via cpg FAILED: (rc=12) Doesn't exist

2011-07-22 Thread Steven Dake
On 07/22/2011 01:15 AM, Proskurin Kirill wrote: > Hello all. > > > pacemaker-1.1.5 > corosync-1.4.0 > > 4 nodes in cluster. 3 online 1 not. > In logs: > > Jul 22 11:50:23 my106.example.com crmd: [28030]: info: > pcmk_quorum_notification: Membership 0: quorum retained (0) > Jul 22 11:50:23 my106

Re: [Pacemaker] corosync-quorumtool configuration

2011-06-21 Thread Steven Dake
On 06/20/2011 07:35 PM, Andrew Beekhof wrote: > I don't think this is legal: > > service { > > name: corosync_quorum > > ver: 0 > > name: pacemaker > > use_mgmtd: yes > > use_logd: yes > > } > > > and even if it were, corosync

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-06-07 Thread Steven Dake
d them to whoever you like ;) Regards -steve > 2011/6/3 Steven Dake : >> On 06/02/2011 08:16 PM, william felipe_welter wrote: >>> Well, >>> >>> Now with this patch, the pacemakerd process starts and up his other >>> process ( crmd, lrmd, pengine..

[Pacemaker] Updated pacemaker-cloud.org website

2011-06-06 Thread Steven Dake
Hi, I want to spend a moment to tell you about our new website at http://pacemaker-cloud.org. This website will serve as our information store and tarball repo location for the Pacemaker-Cloud project. The features page contains the feature set we plan to deliver. Please have a look and forward

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-06-03 Thread Steven Dake
is'. > Jun 02 23:12:21 xx attrd: [7992]: info: crm_cluster_connect: > Connecting to cluster infrastructure: classic openais (with plugin) > Jun 02 23:12:21 xx attrd: [7992]: info: > init_ais_connection_classic: Creating connection to our Corosync > plugin > Jun 02 23:1

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-06-02 Thread Steven Dake
Quicklists:17664 kB > NFS_Unstable: 0 kB > Bounce:0 kB > WritebackTmp: 0 kB > CommitLimit:22651424 kB > Committed_AS: 519368 kB > VmallocTotal: 1069547520 kB > VmallocUsed: 11064 kB > VmallocChunk:

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-06-01 Thread Steven Dake
On 06/01/2011 07:42 AM, william felipe_welter wrote: > Steven, > > cat /proc/meminfo > ... > HugePages_Total: 0 > HugePages_Free:0 > HugePages_Rsvd:0 > HugePages_Surp:0 > Hugepagesize: 4096 kB > ... > It definitely requires

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-06-01 Thread Steven Dake
On 06/01/2011 01:05 AM, Steven Dake wrote: > On 05/31/2011 09:44 PM, Angus Salkeld wrote: >> On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter wrote: >>> Angus, >>> >>> I make some test program (based on the code coreipcc.c) and i now i sure

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-06-01 Thread Steven Dake
address over the end of the > first half of memory it is taken care of the the third mmap which maps > the address back to the top of the file for you. This means you > don't have to worry about ringbuffer wrapping which can be a headache. > > -Angus > interesting this

Re: [Pacemaker] Linux HA on debian sparc

2011-05-31 Thread Steven Dake
Note. there are three signals you could possibly see that generate a core file. SIGABRT (assert() called in the codebase) SIGSEGV (segmentation violation) SIGBUS (alignment error) Make sure you don't have a sigbus. Opening the core file with gdb will tell you which signal triggered the fault.

Re: [Pacemaker] [Openais] Linux HA on debian sparc

2011-05-31 Thread Steven Dake
Try running paceamaker using the MCP. The plugin mode of pacemaker never really worked very well because of complexities of posix mmap and fork. Not having sparc hardware personally, YMMV. We have recently with corosync 1.3.1 gone through an alignment fixing process for ARM arches - hope that so

Re: [Pacemaker] [Openais] Corosync goes into endless loop when same hostname is used on more than one node

2011-05-12 Thread Steven Dake
On 05/12/2011 07:04 AM, Dan Frincu wrote: > Hi, > > When using the same hostname on 2 nodes (debian squeeze, corosync > 1.3.0-3 from unstable) the following happens: > > May 12 08:36:27 debian cib: [3125]: info: cib_process_request: Operation > complete: op cib_sync for section 'all' (origin=loca

[Pacemaker] Pacemaker Cloud Policy Engine Red Hat Summit slides and Mailing List

2011-05-08 Thread Steven Dake
In February we announced our intentions to work on a cloud-specific high availability solution on this list. The code is coming along, and we have reached a point where we should have a mailing list dedicated to cloud specific topics of Pacemaker. The mailing list subscription page is: http://os

[Pacemaker] announcing the Pacemaker Cloud Policy Engine subproject

2011-03-01 Thread Steven Dake
Hi, I'd like to spend a moment to tell you about a new project myself and Angus Salkeld are working on. The project, called the Pacemaker Cloud Policy Engine, is a cloud-specific policy engine and will act as a sub-project of the Pacemaker project. We are doing a ground-up implementation of a cl

Re: [Pacemaker] Cluster Communication fails after VMWare Migration

2011-03-01 Thread Steven Dake
On 02/25/2011 12:40 AM, Andrew Beekhof wrote: > On Wed, Feb 23, 2011 at 10:31 AM, wrote: >> >> Have build a 2 node apache cluster on VMWare virtual machines, which was >> running as expected. We had to migrate the machines to another computing >> center and after that the cluster communication

Re: [Pacemaker] corosync crash

2011-03-01 Thread Steven Dake
On 02/25/2011 12:38 AM, Andrew Beekhof wrote: > This is the same one you sent to the openais list right? > Andrew, This was root caused to a faulty network setup resulting in the failed to receive abort we are working on currently. One key detail missing from this thread is the implementation w

Re: [Pacemaker] Article on HA in the IBM cloud using Pacemaker and Heartbeat

2011-01-28 Thread Steven Dake
On 01/28/2011 08:02 AM, Alan Robertson wrote: > Hi, > > I recently co-authored an article on HA in the IBM cloud using Pacemaker > and Heartbeat. > > http://www.ibm.com/developerworks/cloud/library/cl-highavailabilitycloud/ > > The cool thing is that the IBM cloud supports virtual IPs. With mo

Re: [Pacemaker] pacemaker + corosync in the cloud

2010-12-15 Thread Steven Dake
On 12/14/2010 05:14 PM, ruslan usifov wrote: > Hi > > Is it possible to use pacemaker based on corosync in the cloud hosting > like amazon or soflayer? > > > yes with corosync 1.3.0 in udpu mode. The udpu mode avoids the use of multicast allowing operation in amazon's cloud. Regards -steve

Re: [Pacemaker] UDPU transport patch added, when will the RPMs be available

2010-11-22 Thread Steven Dake
On 11/22/2010 09:27 AM, Dan Frincu wrote: > Hi Steven, > > Steven Dake wrote: >> On 11/19/2010 11:42 AM, Andrew Beekhof wrote: >> >>> On Fri, Nov 19, 2010 at 11:38 AM, Dan Frincu wrote: >>> >>>> Hi, >>>> >>>>

Re: [Pacemaker] service corosync start failed

2010-11-22 Thread Steven Dake
On 11/22/2010 01:27 AM, jiaju liu wrote: > Hi all > If I use command like this > service corosync start > it shows > Starting Corosync Cluster Engine (corosync): [FAILED] > > and I do nothing just reboot my computer it will be OK what is the > reason

Re: [Pacemaker] UDPU transport patch added, when will the RPMs be available

2010-11-19 Thread Steven Dake
On 11/19/2010 11:42 AM, Andrew Beekhof wrote: > On Fri, Nov 19, 2010 at 11:38 AM, Dan Frincu wrote: >> Hi, >> >> The subject is pretty self-explanatory but I'll ask anyway, the patch for >> UDPU has been released, this adds the ability to set unicast peer addresses >> of nodes in a cluster, in net

Re: [Pacemaker] Corosync using unicast instead of multicast

2010-11-08 Thread Steven Dake
On 11/08/2010 05:50 AM, Dan Frincu wrote: Hi, Steven Dake wrote: On 11/05/2010 01:30 AM, Dan Frincu wrote: Hi, Alan Jones wrote: This question should be on the openais list, however, I happen to know the answer. To get up and running quickly you can configure broadcast with the version you

Re: [Pacemaker] Corosync using unicast instead of multicast

2010-11-05 Thread Steven Dake
ed as to what Steven Dake said on the openais mailing list about using broadcast "Broadcast and redundant ring probably don't work to well together.". I've also done some testing and saw that the broadcast address used is 255.255.255.255, regardless of what the bindnetaddr netw

Re: [Pacemaker] Fail over algorithm used by Pacemaker

2010-10-04 Thread Steven Dake
On 10/03/2010 07:01 AM, hudan studiawan wrote: Hi, I want to start to contribute to Pacemaker project. I start to read Documentation and try some basic configurations. I have a question: what kind of algorithm used by Pacemaker to choose another node when a node die in a cluster? Is there any ma

Re: [Pacemaker] Corosync node detection working too good

2010-10-04 Thread Steven Dake
On 10/04/2010 02:04 AM, Stephan-Frank Henry wrote: Hello all, still working on my nodes and although the last problem is not officially solved (I hard coded certain versions of the packages and that seems to be ok now) I have a different interesting feature I need to handle. I am setting up m

Re: [Pacemaker] Timeout after nodejoin

2010-09-22 Thread Steven Dake
On 09/22/2010 05:43 AM, Dan Frincu wrote: Hi all, I have the following packages: # rpm -qa | grep -i "(openais|cluster|heartbeat|pacemaker|resource)" openais-0.80.5-15.2 cluster-glue-1.0-12.2 pacemaker-1.0.5-4.2 cluster-glue-libs-1.0-12.2 resource-agents-1.0-31.5 pacemaker-libs-1.0.5-4.2 pacema

Re: [Pacemaker] Connection to our AIS plugin (9) failed: Library error

2010-09-22 Thread Steven Dake
On 09/22/2010 04:02 AM, Szymon Hersztek wrote: Wiadomość napisana w dniu 2010-09-22, o godz. 10:26, przez Andrew Beekhof: 2010/9/21 Szymon Hersztek : Wiadomość napisana w dniu 2010-09-21, o godz. 09:08, przez Andrew Beekhof: 2010/9/21 Szymon Hersztek : Wiadomość napisana w dniu 2010-09-2

Re: [Pacemaker] MCP init script to 21/79?

2010-09-03 Thread Steven Dake
On 09/03/2010 09:56 AM, Vladislav Bogdanov wrote: 03.09.2010 19:34, Steven Dake wrote: Nope, they are in a natural order for both start and stop sequences. So lower number means 'do start or stop earlier'. grep '# chkconfig' /etc/init.d/* Ok, thanks. Changed to 10

[Pacemaker] MCP init script to 21/79?

2010-09-03 Thread Steven Dake
On 08/24/2010 11:06 PM, Andrew Beekhof wrote: On Wed, Aug 25, 2010 at 8:02 AM, Vladislav Bogdanov wrote: 25.08.2010 08:56, Andrew Beekhof wrote: On Wed, Aug 25, 2010 at 7:39 AM, Vladislav Bogdanov wrote: Hi all, pacemaker has # chkconfig - 90 90 in its MCP initscript. Shouldn't it be cor

Re: [Pacemaker] Corosync + Pacemaker New Install: Corosync Fails Without Error Message

2010-06-22 Thread Steven Dake
On 06/18/2010 09:42 AM, Eliot Gable wrote: I don’t have an “aisexec” section at all. I simply copied the sample file, which did not have one. I did figure out why it wasn’t logging. It was set to AMF mode and ‘mode’ was ‘disabled’ in the AMF configuration section. After changing that to ‘enabled

Re: [Pacemaker] use_logd or use_mgmtd kills corosync

2010-06-08 Thread Steven Dake
On 06/08/2010 11:20 PM, Andrew Beekhof wrote: On Wed, Jun 9, 2010 at 7:27 AM, Devin Reade wrote: I was following the instructions for a new installation of corosync and was wanting to make use of hb_gui so, following an installation via yum per the docs, built Pacemaker-Python-GUI-pacemaker-mgm

[Pacemaker] handle EINTR in sem_wait (pacemaker & corosync 1.2.2+ crash)

2010-06-01 Thread Steven Dake
Hello, I have found the cause of the crash that was occurring only on some deployments. The cause is that sem_wait is interrupted by signal, and the wait operation is not retried (as is customary in posix). Patch attached to fix A big thank you to Vladislav Bogdanov for running the test cas

Re: [Pacemaker] corosync/openais fails to start

2010-05-27 Thread Steven Dake
On 05/27/2010 10:20 AM, Gianluca Cecchi wrote: On Thu, May 27, 2010 at 5:50 PM, Steven Dake mailto:sd...@redhat.com>> wrote: On 05/27/2010 08:40 AM, Diego Remolina wrote: Is there any workaround for this? Perhaps a slightly older version of the rpms? If so wh

Re: [Pacemaker] corosync/openais fails to start

2010-05-27 Thread Steven Dake
so I am stuck with a non-functioning cluster. Diego Steven Dake wrote: This is a known issue on some platforms, although the exact cause is unknown. I have tried RHEL 5.5 as well as CentOS 5.5 with clusterrepo rpms and been unable to reproduce. I'll keep looking. Regards -steve On 05/27/2010 06:07

Re: [Pacemaker] corosync/openais fails to start

2010-05-27 Thread Steven Dake
This is a known issue on some platforms, although the exact cause is unknown. I have tried RHEL 5.5 as well as CentOS 5.5 with clusterrepo rpms and been unable to reproduce. I'll keep looking. Regards -steve On 05/27/2010 06:07 AM, Diego Remolina wrote: Hi, I was running the old rpms from

Re: [Pacemaker] Redundant rings vs one bond based ring

2010-05-18 Thread Steven Dake
On Tue, 2010-05-18 at 23:16 +0200, Gianluca Cecchi wrote: > Hello, > based on pacemaker 1.0.8 + corosync 1.2.2, having two network > interfaces to dedicate to cluster communication, what is better/safer > at this moment: > bonding > > a) only one corosync ring on top of a bond interface > b) two

Re: [Pacemaker] Being fenced node is killed again and again even the connection is recovered!

2010-05-14 Thread Steven Dake
ifconfig eth0 down is not a valid test case. that will likely lead to bad things happening. I recommend using iptables to test the software. Also Corosync 1.2.2 is out which fixes bugs vs corosync 1.2.0. Regards -steve On Fri, 2010-05-14 at 18:02 +0800, Javen Wu wrote: > I forget mention the v

Re: [Pacemaker] Corosync crashes when cluster NIC disabled (Something strange happened)

2010-03-31 Thread Steven Dake
On Wed, 2010-03-31 at 16:07 -0400, Simpson, John R wrote: > Greetings all, > > I have a lab cluster using Pacemaker 1.0.8 and Corosync 1.2.0-1 > (see packages below) on CentOS 5.4 (32-bit) VM's running under > VMware ESXi 3.5. My location constraints and connectivity > tests were working well, so

Re: [Pacemaker] Dropping HeartBeat Stack?

2010-03-04 Thread Steven Dake
On Thu, 2010-03-04 at 21:29 +0100, Dennis J. wrote: > On 03/04/2010 03:37 PM, Andrew Beekhof wrote: > > On Thu, Mar 4, 2010 at 2:54 PM, Dennis J. wrote: > > > >> Pacemaker pulls in hearbeat and corosync as dependency. This is what > >> happens > >> on a freshly install centos 5.4 VM: > > > > Ah,

Re: [Pacemaker] High load issues

2010-02-04 Thread Steven Dake
On Thu, 2010-02-04 at 16:09 +0100, Dominik Klein wrote: > Hi people, > > I'll take the risk of annoying you, but I really think this should not > be forgotten. > > If there is high load on a node, the cluster seems to have problems > recovering from that. I'd expect the cluster to recognize that

[Pacemaker] thread safety problem with pacemaker and corosync integration

2010-02-03 Thread Steven Dake
For some time people have reported segfaults on startup when using pacemaker as a plugin to corosync related to tzset in the stack trace. I believe we had fixed this by removing the thread-unsafe usage of localtime and strftime calls in the code base of corosync in 1.2.0. Via further investigation

Re: [Pacemaker] mcast vs broadcast

2010-01-18 Thread Steven Dake
On Mon, 2010-01-18 at 11:25 -0500, Shravan Mishra wrote: > Hi all, > > > > Following is my corosync.conf. > > Even though broadcast is enabled I see "mcasted" messages like these > in corosync.log. > > Is it ok? even when the broadcast is on and not mcast. > Yes you are using broadcast and

Re: [Pacemaker] errors in corosync.log

2010-01-18 Thread Steven Dake
One possibility is you have a different cluster in your network on the same multicast address and port. Regards -steve On Sat, 2010-01-16 at 15:20 -0500, Shravan Mishra wrote: > Hi Guys, > > I'm running the following version of pacemaker and corosync > corosync=1.1.1-1-2 > pacemaker=1.0.9-2-1 >

  1   2   >