Re: [Pacemaker] crmd restart due to internal error - pacemaker 1.1.8

2013-05-10 Thread pavan tc
> > > > Well, and also Pacemaker's crmd process. > > My guess... the node is overloaded which is causing the cib queries to > time out. > > > > > > Is there a cib query timeout value that I can set? > > No. You can set the batch-limit property though, this reduces the rate at > which CIB operation

Re: [Pacemaker] crmd restart due to internal error - pacemaker 1.1.8

2013-05-09 Thread pavan tc
> > > I'll experiment with the cibadmin -t (--timeout) option to see if it helps. > As I can see from the code, the default seems to be 30 ms. > Is there a widely used default for systems with a high load or is it found > out the hard way for each setup? > Easier said than done. Can someone help w

Re: [Pacemaker] crmd restart due to internal error - pacemaker 1.1.8

2013-05-09 Thread pavan tc
> > Is there a cib query timeout value that I can set? I was earlier getting > the TOTEM timeout. > So, I set the token to a larger value (5 seconds) in corosync.conf and > things were much better. > But now, I have started hitting this problem. > > I'll experiment with the cibadmin -t (--timeout)

Re: [Pacemaker] crmd restart due to internal error - pacemaker 1.1.8

2013-05-09 Thread pavan tc
On Fri, May 10, 2013 at 6:21 AM, Andrew Beekhof wrote: > > On 08/05/2013, at 9:16 PM, pavan tc wrote: > > Hi Andrew, Thanks much for looking into this. I have some queries inline. > > Hi, > > > > I have a two-node cluster with STONITH disabled. > > Thats

Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-20 Thread pavan tc
> > Another user hit the same issue and was able to reproduce. > You can see the resolution at > https://bugzilla.redhat.com/show_bug.cgi?id=951340 > > Thanks much for letting me know. I will watch the "Fixed in version" field and upgrade as necessary. Pavan ___

Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-18 Thread pavan tc
Yes, but looking at the code it should be impossible. > Would it be possible for you to add: > > export PCMK_trace_functions=peer_update_callback > > to /etc/sysconfig/pacemaker and re-test (and send me the new logs - > probably in /var/log/pacemaker.log)? > > Sorry about the delay. I have put th

Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-16 Thread pavan tc
On Fri, Apr 12, 2013 at 9:27 AM, pavan tc wrote: > > Absolutely none in the syslog. Only the regular monitor logs from my > resource agent which continued to report as secondary. > >> >> This is very strange, because the thing that caused the I_PE_CALC is a >>

Re: [Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-11 Thread pavan tc
Hi Andrew, Thanks much for looking at this. > Then (after about 15 minutes), I see the following: > > There were no logs at all in between? > Absolutely none in the syslog. Only the regular monitor logs from my resource agent which continued to report as secondary. I also checked /var/log/clust

[Pacemaker] Pacemaker 1.1.8, Corosync, No CMAN, Promotion issues

2013-04-10 Thread pavan tc
Hi, [I did go through the mail thread titled: "RHEL6 and clones: CMAN needed anyway?", but was not sure about some answers there] I recently moved from pacemaker 1.1.7 to 1.1.8-7 on centos 6.2. I see the following in syslog: corosync[2966]: [pcmk ] ERROR: process_ais_conf: You have configured

[Pacemaker] CentOS 6.2 and pacemaker versions

2013-02-21 Thread pavan tc
Hi, I have installed pacemaker/corosync from the standard yum repositories on my CentOS 6.2 box. What I get is the following: pacemaker-cli-1.1.7-6.el6.x86_64 pacemaker-cluster-libs-1.1.7-6.el6.x86_64 pacemaker-libs-1.1.7-6.el6.x86_64 pacemaker-1.1.7-6.el6.x86_64 corosynclib-1.4.1-7.el6_3.1.x86_6

Re: [Pacemaker] Pacemaker stop behaviour when underlying resource is unavailable

2012-12-17 Thread pavan tc
[..] > The idea is to make sure that stop does not fail when the underlying > > resource goes away. > > (Otherwise I see that the resource gets to an unmanaged state) > > Also, the expectation is that when the resource comes back, it joins the > > cluster without much fuss. > > > > What I see is

[Pacemaker] Pacemaker stop behaviour when underlying resource is unavailable

2012-12-14 Thread pavan tc
Hi, I have structured my multi-state resource agent as below when the underlying resource becomes unavailable for some reason: monitor() { state=get_primitive_resource_state() ... ... if ($state == unavailable) return $OCF_NOT_RUNNING ... ... } stop() { monit

[Pacemaker] Listing resources by attributes

2012-12-12 Thread pavan tc
Hi, Is there a way in which resources can be listed based on some attributes? For example, listing resource running on a certain node, or listing ms resources. The crm_resource manpage talks about the -N and -t options that seem to address the requirements above. But they do not provide the expec

Re: [Pacemaker] Moving multi-state resources

2012-12-12 Thread pavan tc
On Wed, Dec 12, 2012 at 6:46 PM, Dejan Muhamedagic wrote: > Hi, > > On Wed, Dec 12, 2012 at 03:50:01PM +0530, pavan tc wrote: > > Hi, > > > > My requirement was to do some administration on one of the nodes where a > > 2-node multi-state resource was running.

[Pacemaker] Moving multi-state resources

2012-12-12 Thread pavan tc
Hi, My requirement was to do some administration on one of the nodes where a 2-node multi-state resource was running. To effect a resource instance stoppage on one of the nodes, I added a resource constraint as below: crm configure location ms_stop_res_on_node rule -inf: \#uname eq `hostname` T

Re: [Pacemaker] Nodes OFFLINE with "not in our membership" messages

2012-12-06 Thread pavan tc
On Thu, Dec 6, 2012 at 5:21 PM, Nikita Michalko wrote: > Hi, > > did you already try to google on: > "not in our membership" ? > > Not sure which part you were addressing. I mean, I did not pluck the github link out of thin air ;) And if it is the lack of information in my email that you are tal

[Pacemaker] Nodes OFFLINE with "not in our membership" messages

2012-12-05 Thread pavan tc
Hi, I have now hit this issue twice in my setup. I see the following github commit addressing this issue: https://github.com/ClusterLabs/pacemaker/commit/03f6105592281901cc10550b8ad19af4beb5f72f >From the patch, it appears there is an incorrect conclusion about the status of the membership of nod

Re: [Pacemaker] Difference between "crm resource" and crm_resource

2012-12-05 Thread pavan tc
> > They are not. "crm" shell just provides a more coherent wrapper around > the various commands. > > > Also, I see that "crm" has a -w option (which gives synchronous behaviour > > to the command) > > Is there something similar for crm_resource? > > No. crm shell then watches the DC until the tra

[Pacemaker] Difference between "crm resource" and crm_resource

2012-12-05 Thread pavan tc
Hi, Can someone please explain how the commands - crm resource stop and crm_resource --resource --set-parameter target-role --meta --parameter-value Stopped are different? Also, I see that "crm" has a -w option (which gives synchronous behaviour to the command) Is there something similar fo