Re: [Pacemaker] crm_mon and last-failure timestamp not always present

2015-01-07 Thread Gianluca Cecchi
On Wed, Jan 7, 2015 at 10:21 AM, Gianluca Cecchi wrote: [snip] > > Is there any parameter inside configuration of cluster and/or resources > that could control if "last-failure" information will be shown or not? > > > Ok, solved. On the cluster where timestamp is sho

[Pacemaker] crm_mon and last-failure timestamp not always present

2015-01-07 Thread Gianluca Cecchi
Hello, I have two old SLES 11 SP2 clusters, each one composed by two nodes, that I'm comparing. They should have same package versions (I' going to verify better this point and configuration files when access will be granted to me) but on one where there is a group configured with mysqld, when the

[Pacemaker] iSCSITarget and iSCSILogicalUnit for CentOS 6.5?

2014-07-08 Thread Gianluca Cecchi
Hello, using pacemaker on CentOS 6.5 I would like to test the agents in subject but I don't find them in /usr/lib/ocf/resource.d/heartbeat/ as expected I have resource-agents-3.9.2-40.el6_5.7.x86_64 and I have enabled standard CentOS repos and epel ones. # yum whatprovides /usr/lib/ocf/resource.d/

Re: [Pacemaker] How to put delay in fence_intelmodular for one node only

2014-06-25 Thread Gianluca Cecchi
On Sun, Jun 22, 2014 at 1:51 AM, Digimer wrote: > Excellent. > > Please note; With IPMI-only fencing, you may find that killing all power > to the node will cause fencing to fail, as the IPMI's BMC will lose power > as well (unless it has it's own battery, but most don't). > > If you find thi

Re: [Pacemaker] Info on failcount automatic reset

2014-06-25 Thread Gianluca Cecchi
On Wed, Jun 25, 2014 at 10:57 AM, Gianluca Cecchi wrote: > > Tried to select "feedback" button at bottom but it doesn't work (at least > on my chrome browser on Fedora 20) for niether the italy one not the > english one... > > Actually the Italian feedback lin

Re: [Pacemaker] Info on failcount automatic reset

2014-06-25 Thread Gianluca Cecchi
On Wed, Jun 25, 2014 at 1:28 AM, Andrew Beekhof wrote: > > > SO it seems at midnight the resource already was with a failcount of 2 > (perhaps caused by problems happened weeks ago..?) and then at 03:38 got a > timeout on monitoring its state and was relocated... > > > > pacemaker is at 1.1.6-1.2

Re: [Pacemaker] How to put delay in fence_intelmodular for one node only

2014-06-21 Thread Gianluca Cecchi
Hi Gianluca, > > I'm not sure of the CIB XML syntax, but here is how it's done using pcs: > OK, thanks Digimer. It seems it worked this way using your suggestions [root@srvmgmt01 ~]# pcs stonith show Fencing(stonith:fence_intelmodular):Started # pcs cluster cib stonith_separate_cfg

[Pacemaker] How to put delay in fence_intelmodular for one node only

2014-06-21 Thread Gianluca Cecchi
Hello, I have a CentOS 6.5 based cluster with pacemaker-1.1.10-14.el6_5.3.x86_64 cman-3.0.12.1-59.el6_5.2.x86_64 and configured pacemaker with cman integration. The nodes are two blades inside an Intel enclosure. At the moment my configuration has this in cluster.conf and this if I r

[Pacemaker] Info on failcount automatic reset

2014-06-20 Thread Gianluca Cecchi
Hello, when the monitor action for a resource times out I think its failcount is incremented by 1, correct? If so, suppose the next monitor action succeeds, does the failcount value automatically resets to zero or does it stay to 1? In the last case, is there any way to configure the cluster to aut

Re: [Pacemaker] pacemaker RHEL6 with cman

2014-03-17 Thread Gianluca Cecchi
On Mon, Mar 17, 2014 at 12:19 AM, Alex Samad - Yieldbroker wrote: > Hi > > > > I am in the process of migrating away from the pcmk plugin for corosync and > converting to cman. > > So from what I gather its > > Pacemaker -> cman -> corosync > > Can I still configure corosync with /etc/corosync/cor

[Pacemaker] How to delay first monitor op upon resource start?

2014-03-13 Thread Gianluca Cecchi
Hello, I have some init based scripts that I configure as lsb resources. They are java based (in this case ovirt-engine and ovirt-websocket-proxy from oVirt project) and they are started through the rhel "daemon" function. Basically it needs a few seconds before the scripts exit and the status opti

Re: [Pacemaker] pacemaker with cman and dbrd when primary node panics or poweroff

2014-03-11 Thread Gianluca Cecchi
On Tue, Mar 11, 2014 at 11:52 PM, Andrew Beekhof wrote: > > On 8 Mar 2014, at 11:31 am, Gianluca Cecchi wrote: > >> I provoke power off of ovirteng01. Fencing agent works ok on >> ovirteng02 and reboots it. >> I stop boot ofovirteng01 at grub prompt to simulate prob

Re: [Pacemaker] pacemaker with cman and dbrd when primary node panics or poweroff

2014-03-11 Thread Gianluca Cecchi
On Wed, Mar 12, 2014 at 12:37 AM, Andrew Beekhof wrote: > It was put in when drbd called: > > fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; > > When and why it called that is not my area of expertise though. > The constraint put by crm-fence-peer.sh was and I think it was g

Re: [Pacemaker] pacemaker with cman and dbrd when primary node panics or poweroff

2014-03-07 Thread Gianluca Cecchi
so I fixed the problem regarding hostname in drbd.conf and in name from cluster point of view. ALso configured and verified fence_vmware agent and enabled stonith Changed in drbd resource configuration resource ovirt { disk { disk-flushes no; md-flushes no; fencing resource-and-stonith; } device

Re: [Pacemaker] Getting split brain after all reboot of a cluster node

2014-03-06 Thread Gianluca Cecchi
On Wed, Mar 5, 2014 at 9:28 AM, Anne Nicolas wrote: > Hi > > I'm having trouble setting a very simple cluster with 2 nodes. After all > reboot I'm getting split brain that I have to solve by hand then. > Looking for a solution for that one... > > Both nodes have 4 network interfaces. We use 3 of t

Re: [Pacemaker] pacemaker with cman and dbrd when primary node panics or poweroff

2014-03-05 Thread Gianluca Cecchi
On Mon, Mar 3, 2014 at 9:29 PM, Digimer wrote: > Two possible problems; > > 1. cman's cluster.conf needs the ''. > > 2. You don't have fencing setup. The 'fence_pcmk' script only works if > pacemaker's stonith is enabled and configured properly. Likewise, you will > need to configure DRBD to use t

[Pacemaker] pacemaker with cman and dbrd when primary node panics or poweroff

2014-03-03 Thread Gianluca Cecchi
Hello, I'm testing pacemaker with cman on CentOS 6.5 where I have drbd resource in classic primary/secondary setup with master/slave config Relevant packages: cman-3.0.12.1-59.el6_5.1.x86_64 pacemaker-1.1.10-14.el6_5.2.x86_64 kmod-drbd84-8.4.4-1.el6.elrepo.x86_64 drbd84-utils-8.4.4-2.el6.elrepo.x8

Re: [Pacemaker] Need to relax corosync due to backup of VM through snapshot

2013-11-25 Thread Gianluca Cecchi
On Sun, Nov 24, 2013 at 4:47 PM, Steven Dake wrote: > Using a real-world example > token: 1 > retrans_before_loss_const: 10 > > token will be retransmitted roughly every 1000 msec and the token will be > determined lost after 1msec. > OK, thank you very much for clarifying this. I also to

Re: [Pacemaker] Need to relax corosync due to backup of VM through snapshot

2013-11-21 Thread Gianluca Cecchi
On Thu, Nov 21, 2013 at 9:09 AM, Lars Marowsky-Bree wrote: > On 2013-11-20T16:58:01, Gianluca Cecchi wrote: > >> Based on docs I thought that the timeout should be >> >> token x token_retransmits_before_loss_const > > No, the comments in the corosync.conf.example

Re: [Pacemaker] Need to relax corosync due to backup of VM through snapshot

2013-11-21 Thread Gianluca Cecchi
On Thu, Nov 21, 2013 at 9:09 AM, Lars Marowsky-Bree wrote: > On 2013-11-20T16:58:01, Gianluca Cecchi wrote: > >> Based on docs I thought that the timeout should be >> >> token x token_retransmits_before_loss_const So in particular I have to ask a correction for this: h

[Pacemaker] Need to relax corosync due to backup of VM through snapshot

2013-11-20 Thread Gianluca Cecchi
Hello, trying to relax timeout because of backups that runs usign Netbackup and VMware storage api and causing cluster reconfiguration Based on docs I thought that the timeout should be token x token_retransmits_before_loss_const but actually through my tests I see that setting or not setting th

Re: [Pacemaker] wiki links to epel domain need to be changed

2012-02-28 Thread Gianluca Cecchi
On Tue, Feb 28, 2012 at 11:23 PM, Andrew Beekhof wrote: > On Wed, Feb 29, 2012 at 2:19 AM, Gianluca Cecchi > wrote: >> Hello, >> for example at http://www.clusterlabs.org/wiki/Install >> http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm >&

[Pacemaker] wiki links to epel domain need to be changed

2012-02-28 Thread Gianluca Cecchi
Hello, for example at http://www.clusterlabs.org/wiki/Install http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm has to be changed now into http://dl.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm (note the 5.3 -> 5.4 too) see also: http://lists.fedoraproj

Re: [Pacemaker] pacemaker version

2010-10-06 Thread Gianluca Cecchi
On Wed, Oct 6, 2010 at 4:25 PM, Shravan Mishra wrote: > That is what I heard too, that's the reason for this question. > On June, inside a complex thread regarding "colocation -inf", Andrew reported the link and also several clarifications after some questions of mine... See in particular: http:

Re: [Pacemaker] Corosync + Pacemaker New Install: Corosync Fails Without Error Message

2010-06-18 Thread Gianluca Cecchi
On Fri, Jun 18, 2010 at 5:25 PM, Eliot Gable wrote: > I am trying to set up Corosync + Pacemaker on a new CentOS 5.5 x86_64 > install, but when I try to start corosync, it just says [FAILED] and does > not provide any further information. I created the authkey using > corosync-keygen and created

Re: [Pacemaker] Problems shutting down server/services

2010-06-17 Thread Gianluca Cecchi
On Thu, Jun 17, 2010 at 12:28 PM, Erich Weiler wrote: > this is a problem ... without libvirtd running all management attempts via >> "virsh" will fail (including the VirtualDomain RA) ... why is libvirtd >> stopping before corosync? >> > > Ah, because I forgot that I have libvirtd starting and s

Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-15 Thread Gianluca Cecchi
On Tue, Jun 15, 2010 at 4:43 PM, Andrew Beekhof wrote: > > > > > But that is for 1.1 branch that is not considered as "stable"... > > No, existing functionality its very stable. > Its just the new features that might have some extra corner cases > we've not seen exercised yet. > > Put it this way

Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-15 Thread Gianluca Cecchi
On Tue, Jun 15, 2010 at 2:48 PM, Vadym Chepkov wrote: > > On Jun 15, 2010, at 8:11 AM, Gianluca Cecchi wrote: > > On Tue, Jun 15, 2010 at 1:50 PM, Andrew Beekhof wrote: > >> [snip] >> >> Score = -inf, plus the patch, plus sequential = true (or unset). >>

Re: [Pacemaker] SBD Fencing daemon: explain me more clear

2010-06-15 Thread Gianluca Cecchi
On Tue, Jun 15, 2010 at 3:36 PM, Lars Marowsky-Bree wrote: > On 2010-06-15T16:32:12, Aleksey Zholdak wrote: > > [snip] > > > > Why is the MPIO scenario so slow? > > These questions needs to be asked to developers mptsas (novell + hp) > > You should really file a service request then then with

Re: [Pacemaker] Shouldn't colocation -inf: be mandatory?

2010-06-15 Thread Gianluca Cecchi
On Tue, Jun 15, 2010 at 1:50 PM, Andrew Beekhof wrote: > [snip] > > Score = -inf, plus the patch, plus sequential = true (or unset). > Not sure how that looks in shell syntax though. > > Which patch? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs

Re: [Pacemaker] Pacemaker and Apache recourse configuration problem

2010-06-14 Thread Gianluca Cecchi
2010/6/12 Julio Gómez > > There is the error. Thanks. > > Marco was meaning about uncommenting these lines in your /etc/apache2/apache2.conf SetHandler server-status Order deny,allow Deny from all Allow from 127.0.0.1 and to have this uncommented (default for rh el 5 based ht

[Pacemaker] Info on customization of RA

2010-05-28 Thread Gianluca Cecchi
Hello, I tried this approach and it seems to work ootb. But I would like to know if there could be silent drawbacks or potential problems from a technical point of view. Under /usr/lib/ocf/resource.d/ I create a directory named myocfdir and inside it I put a copy of an existing RA (eg apache) /usr

Re: [Pacemaker] advisory ordering question

2010-05-28 Thread Gianluca Cecchi
On Fri, May 28, 2010 at 8:00 AM, Andrew Beekhof wrote: > > > I don't know if it is the same bug... > > In the sense that in your bug case the problem is related to: > > - the stop operation not enforcing the order constraint > > - the constraint is between a clone of a group and a clone of a reso

Re: [Pacemaker] corosync/openais fails to start

2010-05-27 Thread Gianluca Cecchi
On Thu, May 27, 2010 at 5:50 PM, Steven Dake wrote: > On 05/27/2010 08:40 AM, Diego Remolina wrote: > >> Is there any workaround for this? Perhaps a slightly older version of >> the rpms? If so where do I find those? >> >> > Corosync 1.2.1 doesn't have this issue apparently. With corosync 1.2.1,

Re: [Pacemaker] advisory ordering question

2010-05-27 Thread Gianluca Cecchi
On Tue, May 25, 2010 at 3:39 PM, Dejan Muhamedagic wrote: > Hi, > [snip] > > So I presume the problem could be caused by the fact that the second part > is > > a clone and not a resource? or a bug? > > I can eventually send the whole config. > > Looks like a bug to me. Clone or resource, constrain

[Pacemaker] advisory ordering question

2010-05-20 Thread Gianluca Cecchi
Hello, manual for 1.0 (and 1.1) reports this for Advisory Ordering: On the other-hand, when score="0" is specified for a constraint, the constraint is considered optional and only has an effect when both resources are stopping and or starting. Any change in state by the first resource will have no

Re: [Pacemaker] clone ip definition and location stops myresources...

2010-05-20 Thread Gianluca Cecchi
On Thu, May 20, 2010 at 2:40 PM, Koch, Sebastian wrote: > Thanks for your hints. I had the same issue and your tips nearly resolved > it for me. But i got a question. I setted the default timeout and afterwards > the pingd resource started to work as expected. I had a IPTABLES Rule > dropping ic

[Pacemaker] Redundant rings vs one bond based ring

2010-05-18 Thread Gianluca Cecchi
Hello, based on pacemaker 1.0.8 + corosync 1.2.2, having two network interfaces to dedicate to cluster communication, what is better/safer at this moment: a) only one corosync ring on top of a bond interface b) two different rings, each one associated with one interface ? Question based also on c

Re: [Pacemaker] IP address does not failover on a new test cluster

2010-05-15 Thread Gianluca Cecchi
On Fri, May 14, 2010 at 11:40 PM, Ruiyuan Jiang wrote: > Hi, > > I just created a testing cluster (Corosync 1.2.1, OpenAIS 1.1.2 and > Pacemaker 1.0.8) on RHEL 5.5, x86_64. When I shutdown the primary node, the > secondary node does not bind the cluster IP address which it supposed to. > Why? Than

Re: [Pacemaker] Question on resources' dependency and failover

2010-05-14 Thread Gianluca Cecchi
On Wed, May 12, 2010 at 10:22 PM, Gianluca Cecchi wrote: > On Wed, May 12, 2010 at 2:27 PM, Andrew Beekhof > wrote: > >> Attach the output from cibadmin -Ql when the cluster is in this state >> and I'll take a look. >> >> >> > Ok. here we are with

Re: [Pacemaker] How SuSEfirewall2 affects on openais startup?

2010-05-13 Thread Gianluca Cecchi
On Thu, May 13, 2010 at 8:27 AM, Tim Serong wrote: > Hi, > > On 5/13/2010 at 03:56 PM, Aleksey Zholdak wrote: > > > The firewall should let through the UDP multicast traffic on > > > ports mcastport and mcastport+1. > > > > As I wrote above: all interfaces in SuSEfirewall2 is set to "Internal >

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Gianluca Cecchi
On Tue, May 11, 2010 at 5:47 PM, Vadym Chepkov wrote: > pingd is a daemon with is running all the time and does it job > you still need to define monitor operation though, what if the daemon dies? > op monitor just have a different meaning for ping and pingd. > with pingd - monitor daemon > with

[Pacemaker] Question on resources' dependency and failover

2010-05-11 Thread Gianluca Cecchi
Hello, I'm using pacemaker 1.0.8 on rh el 5.5 x86 with clusterlabs repo. Based on other posts on linux-ha I'm trying to configure a 2-nodes cluster where one of the nodes is nfs-server and the other one is nfs-client of the resource exported by the first one. The main parts borrowed form the relat

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Gianluca Cecchi
On Tue, May 11, 2010 at 1:13 PM, Vadym Chepkov wrote: > First of all, none of the monitor operation is on by default in pacemaker, > this is something that you have to turn on > For the ping RA start and stop op parameters don't do much, so you can > safely drop them. > > > Yes, but for the pace

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Gianluca Cecchi
On Tue, May 11, 2010 at 12:50 PM, Vadym Chepkov wrote: > You forgot to turn on monitor operation for ping (actual job) > > > I saw from the [r...@ha1 ~]# crm ra meta ping command Operations' defaults (advisory minimum): start timeout=60 stop timeout=20 reload

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Gianluca Cecchi
On Tue, May 11, 2010 at 11:58 AM, Dejan Muhamedagic wrote: > Do you see the attribute set in the status section (cibadmin -Ql > | grep -w pingd)? If not, then the problem is with the resource. [r...@ha1 ~]# cibadmin -Ql | grep -w pingd Tried to ch

Re: [Pacemaker] clone ip definition and location stops my resources...

2010-05-11 Thread Gianluca Cecchi
nst pinggw > > I suggest you do not change name though, but adjust your location > constraint to use pingd instead. > crm_mon only notices "pingd" at the moment whenn you pass -f argument: it's > hardcoded > > > On Mon, May 10, 2010 at 9:34 AM, Gianluca Cecch

[Pacemaker] clone ip definition and location stops my resources...

2010-05-10 Thread Gianluca Cecchi
Hello, using pacemaker 1.0.8 on rh el 5 I have some problems understanding the way ping clone works to setup monitoring of gw... even after reading docs... As soon as I run: crm configure location nfs-group-with-pinggw nfs-group rule -inf: not_defined pinggw or pinggw lte 0 the resources go stopp